Posts Tagged ‘Sedona Conference’

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Amending the FRCP: More Questions than Answers

Friday, October 14th, 2011

Outcry from many in the legal community has caused a number of groups to consider whether the Federal Rules of Civil Procedure (FRCP) should be amended.  The dialogue began in earnest a year ago at the Duke Civil Litigation Conference and picked up speed following an eDiscoverymini-conference” held in Dallas last month (led by the Discovery Subcommittee –  appointed by the Advisory Committee on Civil Rules).  The rules amendment topic is so hot that the Sedona Conference (WG1) spent most of its two day annual meeting discussing the need for amendments and evaluating a range of competing proposals.

During this dialogue (which I can’t quote verbatim) a number of things became clear to me…

1.  This rules amendment quandary is a bit of a chicken and egg riddle — meaning that it’s hard to cast support wholeheartedly for a rules change if there isn’t a good consensus for what a particular change would accomplish and what the long term consequences might be as technology quickly morphs.  As an example, if there was a redefined preservation trigger that started the duty to preserve when there was a reasonable “certainty” of litigation (versus a mere “likelihood”), would this really make a material impact?  Or, would this inquiry still be as highly fact specific as it is today?  Would this still be similarly prone to the 20/20 hindsight judgment that’s inevitable as well?

2. While it is clear that preservation has become a more complex and risk laden process, it’s not clear that this “pain” is causally related to the FRCP.  In the notes from the Dallas mini-conference, a pending Sedona survey was quoted, referencing the fact that preservation challenges were overwhelmingly increasing:

“[S]ome trends can be noted. 95% (of the surveyed members) agreed that preservation issues were more frequent. 75% said that development was due to the proliferation of information.”

3. Another camp of stakeholders complain that the existing rules (as amended in 2006) aren’t being followed by practitioners or understood by the judiciary.  While this may be the case, it then begs the critical question: If folks aren’t following the amended rules (utilizing proportionality, leveraging FRE 502, etc.) is it really reasonable to think that any new rules would be followed this time around?

4. The role of technology in easing the preservation burden represents another murky area for debate.  For example, it could be argued that preservation pains (i.e., costs) are only really significant for organizations that haven’t deployed state of the art information governance solutions (e.g., legal hold solutions, email archives, records retention software, etc.) to make the requisite tasks less manual.

5. And finally, even assuming that the FRCP is magically re-jiggered to ease preservation costs, this would only impact organizations with litigation in Federal court. This leaves many still exposed to varying standards for the preservation trigger, scope and associated sanctions.

So, in the end, it’s unclear what the future holds for an amended FRCP landscape.  Given the range of divergent perspectives, differing viewpoints on potential solutions and the time necessary to navigate the Rules Enabling Act, the only thing that’s clear is that the cavalry isn’t coming to the rescue any time soon.  This means that organizations with significant preservation pains should endeavor to better utilize the rules that are on the books and deploy enabling technologies where possible.

A Judicial Perspective: Q&A With Former United States Magistrate Judge Ronald J. Hedges Regarding Possible Discovery Related Rule Changes

Friday, September 9th, 2011

If you have been following my previous posts regarding possible amendments to the Federal Rules of Civil Procedure (Rules), then you know I promised a special interview with former United States Magistrate Judge Ron Hedges.  The timing of the discussion is perfect considering that a “mini-conference” is being hosted by a Federal Rules Discovery Subcommittee today (September 9th) in Dallas, TX.  The debate will focus on whether or not the Rules should be amended to address evidence preservation and sanctions.  I am attending the mini-conference and will summarize my observations as part of my next post.  In the meantime, please enjoy reading the dialogue below for a glimpse into Judge Hedges’ perspective regarding possible Rule amendments.

Nelson: You were recently quoted in a Law Technology News (LTN) article written by Evan Koblentz as saying, “I don’t see a need to amend the rules” because these rules haven’t been around long enough to see what happens.  Isn’t almost five years long enough?

Judge Hedges: No.  For the simple reason that both attorneys and judges continue to need education on the 2006 amendments and, more particularly, they need to understand the technologies that create and store electronic information.  The amendments establish a framework within which attorneys and judges make daily decisions on discovery.  I have not seen any objective evidence that the framework is somehow failing and needs further amendment.

Nelson: You also said the “big problem” is that people don’t talk enough.  What did you mean?  Hasn’t the Sedona Cooperation Proclamation made a difference?

Judge Hedges: The centerpiece of the 2006 amendments (at least in my view) is Rule 26(f).  I think it is fair to say that the legal community’s response to 26(f) has been, to say the least, varied. Civil actions with large volumes of ESI that may be discoverable under Rule 26(b)(1) cry out for extensive 26(f) meet-and-confer discussions that may take a number of meetings and require the presence of party representatives from, for example, IT.  There is an element of trust required between adversary counsel (with the concurrence of the parties they represent) that may be difficult to establish – but some cooperation is necessary to make 26(f) work.  Overlay that reality with our adversary system and the duty of attorneys to zealously advocate on behalf of their clients and you can understand why cooperation isn’t always a top priority for some attorneys.

However, “transparency” in discussing ESI is essential, along with advocacy and the need to maintain appropriate confidentiality. That’s where the Sedona Conference Proclamation can make a big difference. Has the Proclamation done that? It’s too early to reach a conclusion on that question, but the Proclamation is often cited and, as education progresses in eDiscovery, I am confident that the Proclamation will be recognized as a means to realize the just, speedy, and inexpensive resolution of litigation, as articulated under Rule 1.

Nelson: You also mentioned that the Federal Rules Advisory Committee might be running afoul of the Rules Enabling Act.  Can you explain?

Judge Hedges: There is a distinction between “procedural” and “substantive” rules.  The Rules Enabling Act governs the adoption of the former.  Rule 502 of the Federal Rules of Evidence is an example of a substantive rule that was proposed by the Judicial Conference.  However, since Rule 502 is a rule dealing with substantive privilege and waiver issues, it had to be enacted into law through an Act of Congress.  I am concerned that proposals to further amend the Federal Rules of Civil Procedure may cross the line from procedural to substantive.  I am not prepared to suggest at this time, however, that anything I have seen has crossed the line.  Stay tuned.

Nelson: If you had to select one of the three options currently being considered (see page 264), which option would you select and why?

Judge Hedges: To start, I would not choose option 1, which presumes that the Rules can reach pre-litigation conduct consistent with the Rules Enabling Act.  My concern here is also that, in the area of electronic information, a too-specific rule risks “overnight” obsolescence, just as the Electronic Communications Privacy Act, enacted in 1986, is considered by a number of commentators to be, at best, obsolescent.  Note also that I did not use the word “stored” when I mentioned electronic information, as courts have already required that so-called ephemeral information be preserved.  Nor would I choose option 2.  Absent seeing more than the brief description of the category on page 264, it seems to me that option 2 is likely to do nothing more than be a restatement of the existing law on when the duty to preserve is “triggered.”

So, by default, I am forced to choose option 3.  I presume a rule would say something like, “sanctions may not be imposed on a party for loss of ESI (or “EI”) if that party acted reasonably in making preservation decisions.”  There are a number of problems here. First, in a jurisdiction which allows the imposition of at least some sanction for negligence, all the rule would likely do is be interpreted to foreclose “serious” sanctions. Isn’t that correct? Or is the rule intended to supersede existing variances in the law of sanctions?  At that point, does the rule become “substantive”?   Second, how will “reasonableness” be defined?  Reasonableness supposes the existence of a duty – in this case, a duty to preserve.  For example, is there a duty to preserve ephemeral data that a party knows is relevant?  We come back full circle to where we began.

Remember, Rule 37(f) (now 37(e)) was intended to provide some level of protection against the imposition of sanctions, just as the categories are intended to.  Right?  And five years later 37(e) remains defined variously to be a “safe harbor” or a “lighthouse” by some lawyers such as Jonathan Redgrave or an “uncharted minefield” by others like me.

Nelson: What about heightened pleading standards after the Iqbal and Twombly decisions?  Do these decisions have any relevance to electronic discovery and the topic at hand?

Judge Hedges: Let me begin by saying that I am no fan of Twombly or Iqbal. The decisions, however well intended, have led to undue cost and delay all too often.  Not only is motion to dismiss practice costly for parties, but it imposes great burdens on the United States Courts and, as often as not, leads to at least one other round of motion practice as plaintiffs are given leave to re-plead.  All the while, parties have preservation obligations to fulfill and, in the hope of saving expense, discovery is often stayed until a motion is “finally” decided.  I would like to see objective evidence of the delay and cost of this motion practice (and I expect that the Administrative Office of the United States has statistical evidence already).  I would also like to see objective evidence from defendants distinguishing between the cost of motion practice and later discovery costs.

Putting all that aside, and if I had to accept one option, I would choose to allow some discovery that is integrated to the motion practice.  First, even without the filing of a responsive pleading, there should be a 26(f) meet-and-confer to discuss, if nothing else, the nature and scope of preservation and the possibility of securing a Rule 502(d) order. Second, while I have serious concerns about “pre-answer discovery” for a number of reasons, I would have the parties make 26(a)(1) disclosures while a motion to dismiss is pending or leave to re-plead has been granted in order to address the likely “asymmetry of information” between a plaintiff and a moving defendant.  Once the disclosures are made, I would allow the plaintiff to secure some information identified in the disclosures to allow re-pleading and perhaps obviate the need for continued motion practice.

All of this would, of course, require active judicial management.  And one would hope that Congress, which seems so interested in conserving resources, would recognize the vital role of the United States Courts in securing justice for everyone and give adequate funding to the Courts.

Bit by Bit: Building a Better eDiscovery Collection Solution

Friday, July 29th, 2011

Is there a place in eDiscovery today for hard drive imaging and bit by bit copies, which collect deleted items or slack/unused hard disk space?  The answer is yes with some important limitations.  For the vast majority of matters, ESI can be collected without imaging drives or utilizing proprietary container files.  However, I occasionally still encounter folks who are victims of the dated and costly misconception that eDiscovery always requires the bit-level imaging of hard drives.

There are situations, though, where the existence of data (as opposed to its content) is central to the matter – when companies suspect employees of stealing proprietary information or when employees leave a company under suspicious circumstances.  In these and other similar situations, it may make sense to have the employee’s workstation hard drive imaged for full forensic analysis.  Even in these scenarios, I find that companies are more likely to hire an external investigator to perform this task to allay suspicions of tampering or bias, and the company generally would prefer that this investigator be the one to testify about this sensitive data acquisition.  Then, for ESI beyond the target employee’s hard drive, other collection methods may be used.  As we’re now midway through 2011 – a year in which I expect to see eDiscovery fully embraced by many corporations as a true business process – I wanted to analyze why the forensic disk image myth still exists, where it came from, and what the law really requires of an eDiscovery collections process.

Traditionally, cases that mentioned full forensic imaging of hard drives began their captions with United States v. or State v. because they were criminal matters.  In traditional civil litigation – even the behemoth eDiscovery cases that get all the bloggers blogging – forensic imaging simply is not required or needed.  In fact, in most cases, it will dramatically increase the cost associated with electronic discovery – this process adds unnecessary complexity in downstream phases of eDiscovery and leads to vast over-collection.  Why collect the Microsoft Office suite 50 times when what you are really required to preserve and collect are the files created with those programs?  When using disk imaging, program files are collected which drives up storage costs and requires the post-collection step of deNISTing (removing system files based on the NIST list).  Why not leave those system files behind and perform a targeted collection of only user-created content?    In addition, the primary rules governing civil litigation – the Federal Rules of Civil Procedure and Federal Rules of Evidence – simply do not require exact duplication of electronic files.  I am amazed that there are so many experts who are still pushing full forensic imaging and duplication in every case.  In fact, this goes against best practices published by The Sedona Conference, EDRM, and in the E-Discovery textbook co-authored by Judge Shira A. Sheindlin.

In comment 8c of the Sedona Principles, the authors call making forensic image backups of computers “the first step of an expensive, complex, and difficult process of data analysis that can divert litigation into side issues and satellite disputes involving the interpretation of potentially ambiguous forensic evidence.”  The comment goes on to say that “it should not be required unless exceptional circumstances warrant the extraordinary cost and burden.”  In a whitepaper authored for EDRM by three eDiscovery experts from KPMG, LLC, the authors discussed the high cost of forensic bit-level imaging and, instead, suggested that targeted collection of ESI would be sufficient in the vast majority of non-criminal matters.  They state, “[t]he challenge of Smart EDM [Evidence and Discovery Management] is to obtain targeted files in a forensically sound manner – chain-of-custody established, proven provenance, and metadata intact – without having to resort to drive imaging.”

In Electronic Discovery and Digital Evidence: Cases and Materials, written by Judge Shira A. Scheindlin, Daniel J. Capra, and The Sedona Conference, the authors state that,

“because imaging software is commonly available, and because the vast majority of training programs in the field of electronic discovery revolve around forensics, there is a growing tendency to want to ‘image everything.’  But unless an argument can be made that the matter at hand will benefit from a forensic collection and additional examination, there is no reason to do a forensic collection just because the technology exists to do it.”

So, with the top experts in the field saying the days of “image everything” should be over, why does it still happen?  Why are the victims of this antiquated workflow still paying the exorbitant costs of a solution that does not really meet their requirements?  Perhaps a historical perspective will be helpful in explaining.

Why Drive Imaging and Proprietary Containers?

I do not think there is any debate on the benefit of having a bit-level image of a hard drive in a criminal investigation.  However, traditionally, the investigators using these methods needed a way to get the imaged drive safely back to a lab for further analysis.  Companies or law enforcement agencies that hired third-party investigators to image drives had to transport the data, maintaining chain of custody, and preserving all contents in an un-alterable state through several phases of the investigation.  And, in criminal matters, it was especially important to maintain the integrity of the evidence when the electronic evidence was central to the government’s case.  Remember, the burden of proof in a criminal matter is “beyond a reasonable doubt” (along with a host of constitutional considerations).  Alteration of key evidence could certainly create reasonable doubt and hose the prosecution’s case (or, worse, the evidence gets tossed by the Court before the trial even begins).  The container file ensures that no matter who handles the evidence, checksums can prove that the contents were not altered since the initial imaging.

Many vendors now offer logical image containers as an alternative to doing a full bit-level image of the drive.  However, in corporate eDiscovery, this is still overkill because the tools and solutions being used downstream still have to unpack or parse these proprietary container formats for processing and analysis.  In fact, even software from the vendors who created these container formats must “crack them open” to get to the contents within.  This seems to add a layer of complexity that has not been needed since the days of the external examiner coming in with her forensic toolkit to do drive images. The format was created to solve a very specific problem, and little thought was given to the use of this format in a holistic process like what is typically seen in civil eDiscovery.   There is no longer a need for a container for portability of evidence because it is most likely going to be processed in place after collection while residing on a secure evidence store on the company’s network.  I have heard “what if our collections methods are challenged?”  And to that, I would respond that we are not in criminal court and that the requirement in civil court is reasonableness, not perfection.  Now, if an employee is suspected of wrongdoing and the potential deletion of files will dramatically alter the case, then by all means, hire a forensic investigator and follow all of the protocols established over the last several decades in computer forensic science.

Fast forward to the 21st century

Corporations are bringing eDiscovery in-house; they are building a business process around it to minimize risk and drive enormous cost savings, and in today’s world of civil litigation, there simply is not a need for these drive images or proprietary containers.  First of all, the burden of proof in a civil matter is “by a preponderance of the evidence.”  What this means is that the burden is satisfied if there is greater than 50% chance that a proposition is true.  This is a much lower standard than in criminal cases.  But, burden of proof goes more to the weight evidence is given by the court or jury.  Before that is even considered, evidence must pass several hurdles of admissibility.  As we will explore, these standards of admissibility have also been the recipients of significant bolstering from vendors over the years.

The Path to Admissibility

There are several hurdles to admissibility for any type of evidence, and because they are not within the scope of this post, I will forego any discussion of relevance, FRE 403, or the hearsay rules.  I will focus on the issues that tend to be associated with electronic evidence: authentication and the “best evidence rule”.  There are some examiners and perhaps even vendors that would argue electronic evidence is simply not admissible if not collected using bit-level imaging (and sometimes 2 copies – one that is referred to by examiners as the “best evidence” copy and another “working copy” to be analyzed).  This is simply not true.  What we will find is that the collection method will go more to the weight of the evidence rather than the minimum showing needed for admissibility (hence, the discussion of burden of proof above).

All evidence must be authenticated pursuant to FRE 901.  This is a “don’t pass Go” threshold requirement for admissibility.  FRE 901 is satisfied by “evidence sufficient to support a finding that the matter in question is what its proponent claims.”  Notwithstanding a “self-authenticating” piece of evidence pursuant to FRE 902, the proponent must establish the identity of the exhibit by stipulation, circumstantial evidence, or the testimony of a witness with knowledge of its identity and authorship.  Typically, objections to this process would tend to go toward whether the exhibit is an original, was altered, or the witness with whom the proponent is attempting to authenticate the exhibit is not able to so based on lack of personal knowledge or some other defect.  Mostly these objections deal with the authenticity of the contents of the exhibit, and the rules in Article X of the FRE are helpful here.  Rule 1001 defines an “original” with respect to data stored in a computer or similar device as “any printout or other output readable by sight, shown to reflect the data accurately.”  This is a far cry from a bit-by-bit forensic image!  Rule 1002 – often referred to as the “Best Evidence Rule” – requires that “[t]o prove the content of a writing, recording, or photograph, the original writing, recording, or photograph is required, except as otherwise provided in these rules or by Act of Congress.”  Not only do these rules not require exact duplication of the electronic files, but they do not require imaging the entire 80GB hard drive to collect the 100MB of files that are potentially relevant to the case.  What they do require, though, is the ability to show that a document being proffered is the same document that was originally created.  In Re Vee Vinhnee, 336 B.R. 437, 444 (B.A.P. 9th 2005). Also, Judge Grimm sets out an extremely comprehensive analysis of what is required for the admissibility of electronic evidence in civil litigation in Lorraine v. Markel American Insurance Company, 241 F.R.D. 534 (D.Md. May 4, 2007).  In Lorraine, he notes that In Re Vee Vinhee may set out the most demanding test for admissibility of ESI.

Maintaining Forensic Integrity

So, how do I combat the claims that “they must have altered that document” or “Your, honor, I swear that line about ‘acceptable losses’ was not in the safety memo when I created it”?  This is where hash value becomes a wonderful thing.  Computing the hash of an electronic file, or computing a hexadecimal checksum based on analysis of the contents of an electronic document, is essentially like recording the DNA of an electronic file.  If the file is altered, its hash value would be different.  So, by computing the hash value at the source, in transit, and at the destination, I can ensure that the electronic file is in exactly the same state as it was at the source (or, that the collected document is the same as the document originally created).  Now, add the ability to report on that information and those container files and full forensic disk images really do become extreme overkill.

The important distinction here is that the term “forensic” does not refer to a type of technology or the products of a specific vendor – despite claims and propaganda to the contrary.  Forensic refers to the methodology used by the person collecting the evidence – whether it is finger prints from a weapon or electronic files from an employee’s laptop.  Forensic imaging, however, refers to the process by which an entire hard disk is copied bit by bit to create an exact duplicate of that hard drive in a forensic manner.  It is entirely possible for a collection of ESI to be “forensically sound” by simply employing the technique described above of taking hash values at each stage of the process to be able to prove that the files were not altered during collection.  As long as chain of custody is also maintained (much easier to do now that we are not using multiple tools, vendors, locations, and people to do the job), then the process should meet the threshold admissibility requirements of the Federal Rules of Evidence.

Opponents will still bring up claims that the evidence must have been altered, or the expert familiar only with forensic imaging technologies will try to use the argument that only vendor X’s technology is “court vetted,” so any other method is not acceptable.  But, to these opponents, I would argue two points:

  1. No technology is “court vetted”.  The operator’s use of the technology in the specific case (in a specific jurisdiction) was acceptable to the court to meet the threshold showings required by FRE 901, 1001, and 1002 – as well as any rules of procedure governing the production of discovery in either a civil or criminal matter.  Wow – that would be a very long footnote on a marketing slide…probably why it is not usually mentioned.
  2. The process is forensically sound, and you can prove that the documents were not altered from collection through production by referencing the hash value and maintaining copies of the original native files analyzed on a secured preservation store.  This would exceed the requirements of FRE 901, 1001, and 1002 – but would provide protection against claims going to the “weight” of the evidence by opponents who would cry foul.

What Now?

So, where does all of this leave us?  First, in the vast majority of civil litigation matters where electronic discovery is being performed, forensic bit by bit imaging of computer hard drives is simply not required.  Vendors have promoted this practice over the years, but all this has done is over-complicate the eDiscovery process for many unsuspecting litigants and dramatically increase costs because the model simply does not scale.  Moreover, the effort and cost required to deal with these full drive images downstream in the process is often overlooked by these vendors and overzealous consultants.  Next, we now know there is a better way – targeted, forensically-sound collection of ESI using streamlined and automated solutions that maintain custodian relationship – even for shared data sources – throughout the eDiscovery lifecycle, preventing form of production disputes and other calamities that have plagued this industry for the last decade.  There is a better way to collect ESI that will provide exponential cost savings all the way to production.

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

Judge Scheindlin Decides that the Metadata is “Integral” in FOIA Case: Fmr. Judge Ron Hedges Weighs In

Monday, February 28th, 2011

Just as when Judge Scheindlin penned Pension Committee, her latest opinion is already garnering a ton of buzz.  In Nat. Day Laborer Org. Network v. United States Immigration and Customs Enforcement Agency (“NDLON”), 2011 WL 381625 (S.D.N.Y. Feb. 7, 2011) Judge Scheindlin boldly takes on four governmental agencies (ICE, the Department of Homeland Security, the Federal Bureau of Investigation, and the Office of Legal Counsel) over metadata production in response to FOIA demands.

In NDLON Plaintiffs submitted identical twenty-one page FOIA requests to each of the four defendant agencies.  And, after some initial missed deadlines and judicial intervention, Plaintiffs sent the defendants a proposed protocol that requested a specific format for the production of electronic records.  Significantly, the proposed protocol was based on the “format demands routinely made by two government entities-the Securities and Exchange Commission and the Department of Justice Criminal Division” (invoking the old “good for the goose” argument).

Before ruling on the protocol, Judge Scheindlin examined the parties’ efforts to cooperate and she was uniformly underwhelmed:

“As far as I can tell from the record submitted by the parties, the equivalent of a Rule 26(f) conference, at which the parties are required to discuss form of production, was not held and no agreement regarding form of production was ever reached. Nor was a dispute regarding form of production brought to the Court for resolution.”

In evaluating controlling law, the fact that “[n]o federal court has yet recognized that metadata is part of a public record as defined in FOIA” didn’t stop Judge Scheindlin from looking to both state law and the FRCP for guidance.  Next, she relied on Aguilar, which noted that the Sedona Conference abandoned an earlier presumption against the production of metadata in recognition of “‘the need to produce reasonably accessible metadata that will enable the receiving party to have the same ability to access, search, and display the information as the producing party ….’”  She then foreshadowed her subsequent ruling by concluding: “[b]y now, it is well accepted, if not indisputable, that metadata is generally considered to be an integral part of an electronic record.”

The Government, not surprisingly didn’t go down without a fight, arguing that “metadata is substantive information that must be explicitly requested and then reviewed by an agency for possible exemptions.”  In concert they also claimed that “if the requirements of FOIA and the requirements of the Rules conflict, FOIA must trump the Rules.”  Judge Scheindlin wasn’t persuaded, holding that:

“[T]here is no need to decide this question because FOIA does not conflict with the Rules. FOIA is silent with respect to form of production, requiring only that the record be provided in ‘any form or format requested by the person if the record is readily reproducible by the agency in that form or format.’… Defendants’ productions to date have failed to comply with Rule 34or with FOIA.”

In terms of the remedy for the government’s failure, she did cut them some slack:  “Because no metadata was specifically requested in Plaintiffs’ July 23 e-mail, and because this is an issue of first impression, I will not require Defendants to re-produce all of the records with metadata.”  But for future productions she held that the bulk of the ESI be produced in “TIFF image format but with corresponding load files, Bates stamping, and the preservation of “parent-child” relationships (i.e. the association between an attachment and its parent record)” citing the metadata list below for non-email files.

  1. Identifier
  2. File Name
  3. Custodian
  4. Source Device
  5. Source Path
  6. Production Path
  7. Modified Date
  8. Modified Time
  9. Time Offset Value

So, here’s the rub.  The legal populous, not surprisingly, likes bright line rules.  So, when Judge Scheindlin writes (in Footnote 41):  “[w]hile not necessary to the holding in this case, I believe that these are the minimum fields of metadata that should accompany any production of a significant collection of ESI” it’s easy to see how the above nine fields may become a blunt instrument wielded haphazardly by requesting parties.   Not surprisingly, Judge Scheindlin is aware of her mantle and further tries to caveat her holding (in footnote 44):

“To be clear, my Order requiring the use of this Proposed Protocol for future productions-as amended by the specific metadata fields I have required and by the options I have offered the parties regarding the form of production for spreadsheets-is limited to this case. I am certainly not suggesting that the Proposed Protocol should be used as a standard production protocol in all cases. The production of individual static images on a small scale, where no automated review platform is likely to be used, may be perfectly reasonable depending on the scope and nature of the litigation.

The impact of footnote 44 was top of mind when I recently spoke to Fmr. Judge Ron Hedges who chimed in:

“Attorneys must confer with regard to production requirements, as they should before bringing any dispute before a federal court. Moreover, attorneys should recognize that, as Judge Scheindlin said in footnote 44, that the selection of metadata fields to request are case-dependent.  Any attempt to arrive at a ‘universal’ or ‘bright line’ standard for production of metadata ignores the text of Rule 34(b) and the bargaining that occurs in meets-and-confers, and the unique aspects of individual civil actions.”

Despite agreeing with Judge Hedges’ sentiment, the main question in my mind will be whether footnote 44 is given its due weight going forward.  My concern is that, as is oft discussed with her Pension Committee decision, parties may hone in on the bright line test and miss the nuances.  While it’s easy to argue against the folly of this thinking, it may not stop it from happening in the near term.

Finally, in another shout out to the Cooperation Proclamation, Judge Scheindlin takes a swipe at counsel, who forced her to rule on an “e-discovery issue that could have been avoided had the parties had the good sense to ‘meet and confer,’ ‘cooperate’ and generally make every effort to ‘communicate’ as to the form in which ESI would be produced.”

“The quoted words are found in opinion after opinion and yet lawyers fail to take the necessary steps to fulfill their obligations to each other and to the court. While certainly not rising to the level of a breach of an ethical obligation, such conduct certainly shows that all lawyers-even highly respected private lawyers, Government lawyers, and professors of law-need to make greater efforts to comply with the expectations that courts now demand of counsel with respect to expensive and time-consuming document production. Lawyers are all too ready to point the finger at the courts and the Rules for increasing the expense of litigation, but that expense could be greatly diminished if lawyers met their own obligations to ensure that document production is handled as expeditiously and inexpensively as possible. This can only be achieved through cooperation and communication.”

In the end, NDLON will continue to generate a ton of discussion (as did Zubulake and Pension Committee).  While this decision won’t single-handedly end the metadata discussion it will hopefully serve as a launching point for more clarity down the road.  For this, practitioners on both sides of the debate should be thankful.

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

Moody v. Turner: An E-Discovery Battle with No Winners

Friday, December 3rd, 2010

The electronic discovery blogosphere is filled with analysis of the recent opinion by Judge Sandra Beckwith of U.S. District Court for the Southern District of Ohio, on the Moody v. Turner case. What is striking about the case is that it reveals a huge gap in understanding the pitfalls of prolonged discovery disputes in the context of attempts by thought leaders to prevent exactly the issues elicited in this opinion. As the excellent post by Ralph Losey indicates, in this case, it is an affront to have this play out in front of Judge Beckwith, a signatory to The Sedona Conference Cooperation Proclamation.

In reviewing the facts of the case, here are highlights on some of the process missteps:

Lack of Early Data Analysis

It is not obvious to some how important it is to perform an early analysis of the data before agreeing to search  ESI for a certain number of custodians and apply certain keywords. This case illustrates three reasons why early data analysis is critically important .

First, the producing party must identify and communicate the right list of custodians. If there is any change or expansion of scope, that needs to be communicated as well. In this case, the Defense team, at their pre-trial 26(f) conference with the Plaintiffs, agreed to produce ESI for twenty six custodians, but chose to send Preservation Notices to larger number of individuals.  While this act by itself is commendable, the lack of prompt communication to the Plaintiffs is certainly a misstep that the Plaintiff chose to latch on to as incomplete production of ESI.

Second, the producing party must have a handle on scope of searches before committing to “run them”.  In reviewing the document Case: 1:07-cv-00692-SSB Doc #: 43, Exhibit 7, it is apparent that the twenty production requests in that report are not trivial. An early analysis of both the data as well as searches at least on a small sample would have helped the producing party understand the scope and challenges of running those searches.

Third, the producing party must evaluate their collection, search, and production methods to evaluate the feasibility of producing metadata. As evidenced in the Plaintiffs’ motion (Doc-89, Page 19), it is clear that the Defense did not produce TIF images along with searchable text. However as noted in Doc-118, Page 18, footnote 10):

“In any event, parties are generally not required to produce the metadata of their data sets. See Wyeth v. Impax Labs., Inc., No. 06-222, 2006 WL 3091331 at *2… Turner has produced all ESI in TIFF format, except for Excel spreadsheets which were produced in native format given the substantial size of many of the spreadsheets (which, if in TIFF format, may print across hundreds of pages). Judge Hogan therefore rightfully declined to compel Turner to produce any additional metadata.”

This is a fairly common request and  one that the Plaintiffs could have placed in their pre-trial 26(f) conference.

Out of Control Production Requests

In reviewing the aforementioned court document, Doc #: 43, Exhibit 7, one can glean a wealth of information on the nature of searches requested by the Plaintiffs and the responses by the Defense team. The immediate problem evident in these requests is an issue raised by the Defense team – that the search requests are overly broad. Some of the search terms are “plan”, “method”, “rate” and “account”, which are certain to hit a very large number of documents. See below for one of the requests.

Production 1-Item 2: All documents other than emails that can be electronically or digitally searched as containing one or more terms that concern the Plan in any way or cash balance pension plans and contain the word “accrual,” “benefit,”, “benefit accrual,” “accrual of benefit”, “accrual methods,” … “calculate”, “calculation”.

This goes on and on, for about eighteen pages. Combined, the twenty production requests would clearly hit almost every collected document (a total of 118GB of documents), thus making a follow-on privilege or confidentiality review prohibitively expensive. It is the lack of specificity in these searches that makes the discovery request overly broad. On the other hand, the response from Defense appears to be also poorly constructed. In their response, what we see is the same boiler-plate text, which didn’t escape the notice of the Plaintiffs and the court.

“Defendants object to this Request because it is overly broad, unduly burdensome, seeks documents that are neither relevant nor likely to lead to the discovery of admissible documents and (because Plaintiffs define “documents” to include electronic or computerized data compilations) seeks electronic documents that are not reasonably accessible due to undue burden and/or cost. Defendants further object to this Request because it implicates documents protected by the attorney-client and/or work-product privilege and any such documents will be withheld from production”

What would have helped the Defense’s case would be actual data supporting their claims. For example, if the defendants were to tabulate that words such as “plan” and “benefit” and provide actual document and/or hit counts, it would have bolstered their claim. As expected, this caused the Plaintiffs to submit a further filing, Doc-89 with a host of complaints, chief among them:

Defendants reported only (1) the total number of unique documents captured by the search of 17 terms and (2) the number of documents that contained the term “cash balance” but none of the Plaintiffs’ other terms. See Doc. 77-10 at 2.

Furthermore, the Plaintiffs appear to be on the right track, recommending:

On October 14, Plaintiffs wrote to Defendants and proposed an “iterative search process” to decide on a final set of search terms.

It seems clear in the on-going discovery disputes, an iterative search process was perceived as contrary to zealous advocacy of their client’s positions and not as a path to resolving further disputes, much as the Cooperation Proclamation suggests. In this context, engaging in a search expert is essential – someone who can modify the search to include more restrictive criteria to limit your search results. Why bother running an open-ended search and produce 29.4GB of useless junk, when you can combine these terms with Boolean, proximity, and other searches? The types of searches, and what each can offer, is a topic that the members of EDRM tackled in formulating their EDRM Search Guide, which is a must-read for anyone attempting to construct e-discovery searches.

Proportionality Arguments Without Strong Basis

An important point to note is that any discovery request that uses inefficient processes and inappropriate technologies will certainly result in undue burdens and cost.  It appears that the Defense team did not offer proper cost estimates (arguments put forth in Doc-77-10 notwithstanding), and just pushed an undue burden/cost argument with the hope that the courts would absolve them of discovery obligations. At the same time, the Plaintiffs did seem to have over-reached a bit on extending their discovery disputes with the hope of reaching a favorable outcome. Two examples of such attempts are:

  1. Upon Defense producing the documents (Doc-118),

Turner has produced every responsive, non-privileged document obtained through the email ESI searches that related to the Plan; these comprise 4.1 GB, or more than 40,708 pages of documents.

The Plaintiffs counter with:

“Plaintiffs maintain that Turner should be compelled to produce the metadata for the email ESI it has produced because otherwise they allegedly “cannot know whether Defendants have searched all 33 custodians’ email files” and “cannot confirm whether any email files were electronic in origin (rather than printouts of emails) or determine whose files they came from.”

As noted earlier, request for metadata and the feasibility of producing it must be negotiated specifically in the 26(f) conference.

  1. The attempt of the Plaintiffs to expand discovery, to compel any and every third party, including Defense’s former law firms, as well as inspect “shared network drives”, “non-shared drives” etc.

“Judge Hogan recognized that Turner should not be compelled to probe through the recesses of its internal electronic systems for even more ESI on top of the 47,000-plus hard copy documents and the 40,000-plus pages of ESI it has produced – because those additional searches are not likely to lead to the discovery of any evidence relevant to plaintiffs’ claims. Judge Hogan was presented with the gory history of Turner’s efforts to search through “shared network drives” and “non-shared drives,” emails and backups. He found these efforts to be sufficient, and rightly rejected plaintiffs’ demand for additional ESI.”

One can see that Plaintiff’s attempt to drag the electronic discovery efforts into an endless battle was counterproductive.

Final Takeaway

The Sedona Conference Cooperation Proclamation rightfully recommends “Jointly developing automated search and retrieval methodologies to cull relevant information”. As costs for getting to the facts escalate, a comprehensive strategy that uses the best processes, the best technology, and a commitment to the Cooperation Proclamation is essential for the legal system to deliver what people expect – justice based on facts. Gamesmanship as evidenced in Moody v. Turner is detrimental to this cause.

Fulbright Litigation Survey Calls Out Need for More Proportionality/Rules Changes

Thursday, November 11th, 2010

Fulbright & Jaworski recently issued its “7th Annual Litigation Trends Survey Report” and there were several interesting trends worth noting.   Not surprisingly, the general pace of litigation is forecast to increase upwards, relatively unabated, with more than 25% of respondents expecting their companies’ disputes to increase in the next 12 months.

Beyond this trend it’s clear that there’s also groundswell of support for a movement towards more e-discovery proportionality.  While also a big topic at Sedona’s annual conference (and discussed in the recent Moody case), a whopping 79% of US respondents think the “US Rules of Civil Procedure should be modified in some way to limit e-discovery in civil cases.”  While I haven’t heard of any specific proposals for a rules amendment, it’s clear that folks aren’t happy with the status quo, particularly with the increasing discovery burden facing enterprises dealing with unilateral disputes.   This discontent is likely tied to the fact that costs continue to escalate, with the survey indicating that more than 40% of the largest US companies (over $1B in Revenue) plan to “increase their spending on e-discovery in the next 12 months.”

Finally, the survey also focused on an area that’s getting an increasing level of scrutiny.   Fulbright asked “when preserving potentially relevant information in litigation or an investigation, what methods do you use most frequently for preserving electronically stored information?”  Leading the pack, with 55% of vote, was “rely on individual custodians to identify and preserve their own information.”  Custodian based collections have been discussed recently as being under fire in blogs and other recent cases such as Pension Committee and Ford Motor Co. v. Edgewood Properties Inc. The notion is that under- or un-supervised collection methodologies are dangerous because it’s relatively easy to paint the custodians at issue as either being motivated to hide responsive data or relatively unconcerned with compliance.  Nevertheless, it’s clear that (as of now) custodian-based collections are still somewhat “reasonable” given that more than 50% of the populous collects data this way.

On the other side of the spectrum from custodian based ESI collections, there are automated data collection tools and methods that can be considered too.  There are undoubtedly advantages (risk reduction, speed, audit trails, etc.)  to using “automated search software” for the collection of data (like 43% of the respondents did in the Fulbright survey).  Yet, it’s clear this isn’t a zero sum game – meaning there’s currently a place for both methodologies in the legal landscape.  For many organizations it becomes a risk management exercise as summarized in a recent  ARMA article entitled “Is ‘Manual’ Collection of ESI Defensible?”: “Companies may choose the manual collection of ESI to reduce costs, particularly if they have limited levels of litigation or lower risk levels posed by the litigation itself.”

In the end, like so many aspects of electronic discovery, almost any well thought out, well documented methodology *can* be defensible, but the onus is on the preserving/collecting party to buttress whatever poison they pick.  Defaulting into a method without preparation, auditing and follow-through is a recipe for disaster.

Kroll Ontrack and Iron Mountain Stratify Demonstrate That “Free” Is Usually NOT The Cheapest Solution For Electronic Discovery

Tuesday, June 1st, 2010

Every car dealer knows he should focus customers on the monthly payment, not the total cost of the car. Every credit card solicitation (or sub-prime mortgage, for that matter) starts with the offer of 0% interest, not the actual interest rate or fees the customer will pay after the first 6 months. The reason is simple: once you lease the car or put a balance on the credit card, it’s very hard to switch away when – as often happens – you find yourself paying much more than you should later on.

I was reminded of these examples when reading about Kroll Ontrack’s offer of “free ECA” and Stratify’s recent press release announcing “free early stage filtering” for electronic discovery. Taking each in turn:

Kroll Ontrack Advanceview

Based on feedback from several customers in Washington DC, New York, and the Mid-West, Kroll Ontrack often provides Advanceview at no charge. That means customers can get “custodian de-duplication” and “1 keyword and date filter pass” for free, although Kroll still charges $200-250/hour for doing the work. The resulting data set is then processed and loaded into its review platform for $1,500-$1,800 per gigabyte.

Is this a good deal? For the vast majority of customers, the answer is “no” for three reasons.

First, customers typically end up paying more than they would using alternative products. For example, in the chart below, we compare the cost of using Kroll Ontrack to that of Clearwell for a 100 gigabyte project. In both cases, we assume customers are doing de-duplication, filtering, keyword searching, first pass review, and load file creation. As with any comparison of this sort, you have to make some simplifying assumptions. For example, we excluded data hosting fees and professional services fees from the analysis.

Whether customers are better off with Kroll depends entirely on how much data is culled out for free before customers incur the high, back-end charges. Given that all Kroll is doing for free is custodian de-duplication and running one set of keywords and date filters, the typical cull rate is likely be anywhere from 20% to 50% — nowhere near the 80% cull rate required for Kroll to be more cost effective than Clearwell.

The second reason why this is not a good deal is that it gives customers no certainty about costs. Culling rates from de-duplication and blind keyword searches are unpredictable and vary widely, meaning that some projects will cost more than expected while others will cost less. But every project has budget that’s determined up front and, as any litigation support manager will tell you, you get much less credit for being under budget than you get pain for going over budget. That’s why cost certainty is one of the leading requests from anyone involved in electronic discovery.

Finally, excluding data based on a single round of keyword searches and date filters is not in line with The Sedona Conference best practices. Rather, Sedona recommends that customers iterate their keywords and culling strategies to hone them appropriately.

Iron Mountain Stratify OnPoint

It is not yet possible to do the same detailed analysis on Stratify’s OnPoint which offers “free early stage filtering”, because it’s impossible to tell exactly what that means. In its artfully-worded press release and data sheet, Stratify promises to provide “free processing and loading of unlimited data for early stage filtering”. Does that include de-duplication? Does that include any keyword searching? My guess is “no”, in which case all they are really doing for free is offering to load data into their review platform so that they can then charge you – not a very compelling offer. But if anyone does know the answer to these questions, or if Stratify would like to clarify exactly what’s being offered for free, then please let me know and I’ll post an update.

Once data is in Stratify’s system, it charges a “one-time fee starting at $500 per gigabyte” for “reviewable data”. But it does not say if that’s the only fee. What about monthly hosting charges? Fees for additional reviewers? Again, it’s not yet clear what the downstream cost of review really is using Stratify, so it’s impossible to know whether this is a good deal.

If there’s one lesson from all of this, it’s “buyer beware”. Just as when you buy a car, sign up for a credit card, or click on that offer to get more corn on Farmville, you need to look beyond the “free offer” and understand what it’s really going to cost you.