Archive for the ‘Sedona Conference’ Category

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Amending the FRCP: More Questions than Answers

Friday, October 14th, 2011

Outcry from many in the legal community has caused a number of groups to consider whether the Federal Rules of Civil Procedure (FRCP) should be amended.  The dialogue began in earnest a year ago at the Duke Civil Litigation Conference and picked up speed following an eDiscoverymini-conference” held in Dallas last month (led by the Discovery Subcommittee –  appointed by the Advisory Committee on Civil Rules).  The rules amendment topic is so hot that the Sedona Conference (WG1) spent most of its two day annual meeting discussing the need for amendments and evaluating a range of competing proposals.

During this dialogue (which I can’t quote verbatim) a number of things became clear to me…

1.  This rules amendment quandary is a bit of a chicken and egg riddle — meaning that it’s hard to cast support wholeheartedly for a rules change if there isn’t a good consensus for what a particular change would accomplish and what the long term consequences might be as technology quickly morphs.  As an example, if there was a redefined preservation trigger that started the duty to preserve when there was a reasonable “certainty” of litigation (versus a mere “likelihood”), would this really make a material impact?  Or, would this inquiry still be as highly fact specific as it is today?  Would this still be similarly prone to the 20/20 hindsight judgment that’s inevitable as well?

2. While it is clear that preservation has become a more complex and risk laden process, it’s not clear that this “pain” is causally related to the FRCP.  In the notes from the Dallas mini-conference, a pending Sedona survey was quoted, referencing the fact that preservation challenges were overwhelmingly increasing:

“[S]ome trends can be noted. 95% (of the surveyed members) agreed that preservation issues were more frequent. 75% said that development was due to the proliferation of information.”

3. Another camp of stakeholders complain that the existing rules (as amended in 2006) aren’t being followed by practitioners or understood by the judiciary.  While this may be the case, it then begs the critical question: If folks aren’t following the amended rules (utilizing proportionality, leveraging FRE 502, etc.) is it really reasonable to think that any new rules would be followed this time around?

4. The role of technology in easing the preservation burden represents another murky area for debate.  For example, it could be argued that preservation pains (i.e., costs) are only really significant for organizations that haven’t deployed state of the art information governance solutions (e.g., legal hold solutions, email archives, records retention software, etc.) to make the requisite tasks less manual.

5. And finally, even assuming that the FRCP is magically re-jiggered to ease preservation costs, this would only impact organizations with litigation in Federal court. This leaves many still exposed to varying standards for the preservation trigger, scope and associated sanctions.

So, in the end, it’s unclear what the future holds for an amended FRCP landscape.  Given the range of divergent perspectives, differing viewpoints on potential solutions and the time necessary to navigate the Rules Enabling Act, the only thing that’s clear is that the cavalry isn’t coming to the rescue any time soon.  This means that organizations with significant preservation pains should endeavor to better utilize the rules that are on the books and deploy enabling technologies where possible.

A Judicial Perspective: Q&A With Former United States Magistrate Judge Ronald J. Hedges Regarding Possible Discovery Related Rule Changes

Friday, September 9th, 2011

If you have been following my previous posts regarding possible amendments to the Federal Rules of Civil Procedure (Rules), then you know I promised a special interview with former United States Magistrate Judge Ron Hedges.  The timing of the discussion is perfect considering that a “mini-conference” is being hosted by a Federal Rules Discovery Subcommittee today (September 9th) in Dallas, TX.  The debate will focus on whether or not the Rules should be amended to address evidence preservation and sanctions.  I am attending the mini-conference and will summarize my observations as part of my next post.  In the meantime, please enjoy reading the dialogue below for a glimpse into Judge Hedges’ perspective regarding possible Rule amendments.

Nelson: You were recently quoted in a Law Technology News (LTN) article written by Evan Koblentz as saying, “I don’t see a need to amend the rules” because these rules haven’t been around long enough to see what happens.  Isn’t almost five years long enough?

Judge Hedges: No.  For the simple reason that both attorneys and judges continue to need education on the 2006 amendments and, more particularly, they need to understand the technologies that create and store electronic information.  The amendments establish a framework within which attorneys and judges make daily decisions on discovery.  I have not seen any objective evidence that the framework is somehow failing and needs further amendment.

Nelson: You also said the “big problem” is that people don’t talk enough.  What did you mean?  Hasn’t the Sedona Cooperation Proclamation made a difference?

Judge Hedges: The centerpiece of the 2006 amendments (at least in my view) is Rule 26(f).  I think it is fair to say that the legal community’s response to 26(f) has been, to say the least, varied. Civil actions with large volumes of ESI that may be discoverable under Rule 26(b)(1) cry out for extensive 26(f) meet-and-confer discussions that may take a number of meetings and require the presence of party representatives from, for example, IT.  There is an element of trust required between adversary counsel (with the concurrence of the parties they represent) that may be difficult to establish – but some cooperation is necessary to make 26(f) work.  Overlay that reality with our adversary system and the duty of attorneys to zealously advocate on behalf of their clients and you can understand why cooperation isn’t always a top priority for some attorneys.

However, “transparency” in discussing ESI is essential, along with advocacy and the need to maintain appropriate confidentiality. That’s where the Sedona Conference Proclamation can make a big difference. Has the Proclamation done that? It’s too early to reach a conclusion on that question, but the Proclamation is often cited and, as education progresses in eDiscovery, I am confident that the Proclamation will be recognized as a means to realize the just, speedy, and inexpensive resolution of litigation, as articulated under Rule 1.

Nelson: You also mentioned that the Federal Rules Advisory Committee might be running afoul of the Rules Enabling Act.  Can you explain?

Judge Hedges: There is a distinction between “procedural” and “substantive” rules.  The Rules Enabling Act governs the adoption of the former.  Rule 502 of the Federal Rules of Evidence is an example of a substantive rule that was proposed by the Judicial Conference.  However, since Rule 502 is a rule dealing with substantive privilege and waiver issues, it had to be enacted into law through an Act of Congress.  I am concerned that proposals to further amend the Federal Rules of Civil Procedure may cross the line from procedural to substantive.  I am not prepared to suggest at this time, however, that anything I have seen has crossed the line.  Stay tuned.

Nelson: If you had to select one of the three options currently being considered (see page 264), which option would you select and why?

Judge Hedges: To start, I would not choose option 1, which presumes that the Rules can reach pre-litigation conduct consistent with the Rules Enabling Act.  My concern here is also that, in the area of electronic information, a too-specific rule risks “overnight” obsolescence, just as the Electronic Communications Privacy Act, enacted in 1986, is considered by a number of commentators to be, at best, obsolescent.  Note also that I did not use the word “stored” when I mentioned electronic information, as courts have already required that so-called ephemeral information be preserved.  Nor would I choose option 2.  Absent seeing more than the brief description of the category on page 264, it seems to me that option 2 is likely to do nothing more than be a restatement of the existing law on when the duty to preserve is “triggered.”

So, by default, I am forced to choose option 3.  I presume a rule would say something like, “sanctions may not be imposed on a party for loss of ESI (or “EI”) if that party acted reasonably in making preservation decisions.”  There are a number of problems here. First, in a jurisdiction which allows the imposition of at least some sanction for negligence, all the rule would likely do is be interpreted to foreclose “serious” sanctions. Isn’t that correct? Or is the rule intended to supersede existing variances in the law of sanctions?  At that point, does the rule become “substantive”?   Second, how will “reasonableness” be defined?  Reasonableness supposes the existence of a duty – in this case, a duty to preserve.  For example, is there a duty to preserve ephemeral data that a party knows is relevant?  We come back full circle to where we began.

Remember, Rule 37(f) (now 37(e)) was intended to provide some level of protection against the imposition of sanctions, just as the categories are intended to.  Right?  And five years later 37(e) remains defined variously to be a “safe harbor” or a “lighthouse” by some lawyers such as Jonathan Redgrave or an “uncharted minefield” by others like me.

Nelson: What about heightened pleading standards after the Iqbal and Twombly decisions?  Do these decisions have any relevance to electronic discovery and the topic at hand?

Judge Hedges: Let me begin by saying that I am no fan of Twombly or Iqbal. The decisions, however well intended, have led to undue cost and delay all too often.  Not only is motion to dismiss practice costly for parties, but it imposes great burdens on the United States Courts and, as often as not, leads to at least one other round of motion practice as plaintiffs are given leave to re-plead.  All the while, parties have preservation obligations to fulfill and, in the hope of saving expense, discovery is often stayed until a motion is “finally” decided.  I would like to see objective evidence of the delay and cost of this motion practice (and I expect that the Administrative Office of the United States has statistical evidence already).  I would also like to see objective evidence from defendants distinguishing between the cost of motion practice and later discovery costs.

Putting all that aside, and if I had to accept one option, I would choose to allow some discovery that is integrated to the motion practice.  First, even without the filing of a responsive pleading, there should be a 26(f) meet-and-confer to discuss, if nothing else, the nature and scope of preservation and the possibility of securing a Rule 502(d) order. Second, while I have serious concerns about “pre-answer discovery” for a number of reasons, I would have the parties make 26(a)(1) disclosures while a motion to dismiss is pending or leave to re-plead has been granted in order to address the likely “asymmetry of information” between a plaintiff and a moving defendant.  Once the disclosures are made, I would allow the plaintiff to secure some information identified in the disclosures to allow re-pleading and perhaps obviate the need for continued motion practice.

All of this would, of course, require active judicial management.  And one would hope that Congress, which seems so interested in conserving resources, would recognize the vital role of the United States Courts in securing justice for everyone and give adequate funding to the Courts.

Bit by Bit: Building a Better eDiscovery Collection Solution

Friday, July 29th, 2011

Is there a place in eDiscovery today for hard drive imaging and bit by bit copies, which collect deleted items or slack/unused hard disk space?  The answer is yes with some important limitations.  For the vast majority of matters, ESI can be collected without imaging drives or utilizing proprietary container files.  However, I occasionally still encounter folks who are victims of the dated and costly misconception that eDiscovery always requires the bit-level imaging of hard drives.

There are situations, though, where the existence of data (as opposed to its content) is central to the matter – when companies suspect employees of stealing proprietary information or when employees leave a company under suspicious circumstances.  In these and other similar situations, it may make sense to have the employee’s workstation hard drive imaged for full forensic analysis.  Even in these scenarios, I find that companies are more likely to hire an external investigator to perform this task to allay suspicions of tampering or bias, and the company generally would prefer that this investigator be the one to testify about this sensitive data acquisition.  Then, for ESI beyond the target employee’s hard drive, other collection methods may be used.  As we’re now midway through 2011 – a year in which I expect to see eDiscovery fully embraced by many corporations as a true business process – I wanted to analyze why the forensic disk image myth still exists, where it came from, and what the law really requires of an eDiscovery collections process.

Traditionally, cases that mentioned full forensic imaging of hard drives began their captions with United States v. or State v. because they were criminal matters.  In traditional civil litigation – even the behemoth eDiscovery cases that get all the bloggers blogging – forensic imaging simply is not required or needed.  In fact, in most cases, it will dramatically increase the cost associated with electronic discovery – this process adds unnecessary complexity in downstream phases of eDiscovery and leads to vast over-collection.  Why collect the Microsoft Office suite 50 times when what you are really required to preserve and collect are the files created with those programs?  When using disk imaging, program files are collected which drives up storage costs and requires the post-collection step of deNISTing (removing system files based on the NIST list).  Why not leave those system files behind and perform a targeted collection of only user-created content?    In addition, the primary rules governing civil litigation – the Federal Rules of Civil Procedure and Federal Rules of Evidence – simply do not require exact duplication of electronic files.  I am amazed that there are so many experts who are still pushing full forensic imaging and duplication in every case.  In fact, this goes against best practices published by The Sedona Conference, EDRM, and in the E-Discovery textbook co-authored by Judge Shira A. Sheindlin.

In comment 8c of the Sedona Principles, the authors call making forensic image backups of computers “the first step of an expensive, complex, and difficult process of data analysis that can divert litigation into side issues and satellite disputes involving the interpretation of potentially ambiguous forensic evidence.”  The comment goes on to say that “it should not be required unless exceptional circumstances warrant the extraordinary cost and burden.”  In a whitepaper authored for EDRM by three eDiscovery experts from KPMG, LLC, the authors discussed the high cost of forensic bit-level imaging and, instead, suggested that targeted collection of ESI would be sufficient in the vast majority of non-criminal matters.  They state, “[t]he challenge of Smart EDM [Evidence and Discovery Management] is to obtain targeted files in a forensically sound manner – chain-of-custody established, proven provenance, and metadata intact – without having to resort to drive imaging.”

In Electronic Discovery and Digital Evidence: Cases and Materials, written by Judge Shira A. Scheindlin, Daniel J. Capra, and The Sedona Conference, the authors state that,

“because imaging software is commonly available, and because the vast majority of training programs in the field of electronic discovery revolve around forensics, there is a growing tendency to want to ‘image everything.’  But unless an argument can be made that the matter at hand will benefit from a forensic collection and additional examination, there is no reason to do a forensic collection just because the technology exists to do it.”

So, with the top experts in the field saying the days of “image everything” should be over, why does it still happen?  Why are the victims of this antiquated workflow still paying the exorbitant costs of a solution that does not really meet their requirements?  Perhaps a historical perspective will be helpful in explaining.

Why Drive Imaging and Proprietary Containers?

I do not think there is any debate on the benefit of having a bit-level image of a hard drive in a criminal investigation.  However, traditionally, the investigators using these methods needed a way to get the imaged drive safely back to a lab for further analysis.  Companies or law enforcement agencies that hired third-party investigators to image drives had to transport the data, maintaining chain of custody, and preserving all contents in an un-alterable state through several phases of the investigation.  And, in criminal matters, it was especially important to maintain the integrity of the evidence when the electronic evidence was central to the government’s case.  Remember, the burden of proof in a criminal matter is “beyond a reasonable doubt” (along with a host of constitutional considerations).  Alteration of key evidence could certainly create reasonable doubt and hose the prosecution’s case (or, worse, the evidence gets tossed by the Court before the trial even begins).  The container file ensures that no matter who handles the evidence, checksums can prove that the contents were not altered since the initial imaging.

Many vendors now offer logical image containers as an alternative to doing a full bit-level image of the drive.  However, in corporate eDiscovery, this is still overkill because the tools and solutions being used downstream still have to unpack or parse these proprietary container formats for processing and analysis.  In fact, even software from the vendors who created these container formats must “crack them open” to get to the contents within.  This seems to add a layer of complexity that has not been needed since the days of the external examiner coming in with her forensic toolkit to do drive images. The format was created to solve a very specific problem, and little thought was given to the use of this format in a holistic process like what is typically seen in civil eDiscovery.   There is no longer a need for a container for portability of evidence because it is most likely going to be processed in place after collection while residing on a secure evidence store on the company’s network.  I have heard “what if our collections methods are challenged?”  And to that, I would respond that we are not in criminal court and that the requirement in civil court is reasonableness, not perfection.  Now, if an employee is suspected of wrongdoing and the potential deletion of files will dramatically alter the case, then by all means, hire a forensic investigator and follow all of the protocols established over the last several decades in computer forensic science.

Fast forward to the 21st century

Corporations are bringing eDiscovery in-house; they are building a business process around it to minimize risk and drive enormous cost savings, and in today’s world of civil litigation, there simply is not a need for these drive images or proprietary containers.  First of all, the burden of proof in a civil matter is “by a preponderance of the evidence.”  What this means is that the burden is satisfied if there is greater than 50% chance that a proposition is true.  This is a much lower standard than in criminal cases.  But, burden of proof goes more to the weight evidence is given by the court or jury.  Before that is even considered, evidence must pass several hurdles of admissibility.  As we will explore, these standards of admissibility have also been the recipients of significant bolstering from vendors over the years.

The Path to Admissibility

There are several hurdles to admissibility for any type of evidence, and because they are not within the scope of this post, I will forego any discussion of relevance, FRE 403, or the hearsay rules.  I will focus on the issues that tend to be associated with electronic evidence: authentication and the “best evidence rule”.  There are some examiners and perhaps even vendors that would argue electronic evidence is simply not admissible if not collected using bit-level imaging (and sometimes 2 copies – one that is referred to by examiners as the “best evidence” copy and another “working copy” to be analyzed).  This is simply not true.  What we will find is that the collection method will go more to the weight of the evidence rather than the minimum showing needed for admissibility (hence, the discussion of burden of proof above).

All evidence must be authenticated pursuant to FRE 901.  This is a “don’t pass Go” threshold requirement for admissibility.  FRE 901 is satisfied by “evidence sufficient to support a finding that the matter in question is what its proponent claims.”  Notwithstanding a “self-authenticating” piece of evidence pursuant to FRE 902, the proponent must establish the identity of the exhibit by stipulation, circumstantial evidence, or the testimony of a witness with knowledge of its identity and authorship.  Typically, objections to this process would tend to go toward whether the exhibit is an original, was altered, or the witness with whom the proponent is attempting to authenticate the exhibit is not able to so based on lack of personal knowledge or some other defect.  Mostly these objections deal with the authenticity of the contents of the exhibit, and the rules in Article X of the FRE are helpful here.  Rule 1001 defines an “original” with respect to data stored in a computer or similar device as “any printout or other output readable by sight, shown to reflect the data accurately.”  This is a far cry from a bit-by-bit forensic image!  Rule 1002 – often referred to as the “Best Evidence Rule” – requires that “[t]o prove the content of a writing, recording, or photograph, the original writing, recording, or photograph is required, except as otherwise provided in these rules or by Act of Congress.”  Not only do these rules not require exact duplication of the electronic files, but they do not require imaging the entire 80GB hard drive to collect the 100MB of files that are potentially relevant to the case.  What they do require, though, is the ability to show that a document being proffered is the same document that was originally created.  In Re Vee Vinhnee, 336 B.R. 437, 444 (B.A.P. 9th 2005). Also, Judge Grimm sets out an extremely comprehensive analysis of what is required for the admissibility of electronic evidence in civil litigation in Lorraine v. Markel American Insurance Company, 241 F.R.D. 534 (D.Md. May 4, 2007).  In Lorraine, he notes that In Re Vee Vinhee may set out the most demanding test for admissibility of ESI.

Maintaining Forensic Integrity

So, how do I combat the claims that “they must have altered that document” or “Your, honor, I swear that line about ‘acceptable losses’ was not in the safety memo when I created it”?  This is where hash value becomes a wonderful thing.  Computing the hash of an electronic file, or computing a hexadecimal checksum based on analysis of the contents of an electronic document, is essentially like recording the DNA of an electronic file.  If the file is altered, its hash value would be different.  So, by computing the hash value at the source, in transit, and at the destination, I can ensure that the electronic file is in exactly the same state as it was at the source (or, that the collected document is the same as the document originally created).  Now, add the ability to report on that information and those container files and full forensic disk images really do become extreme overkill.

The important distinction here is that the term “forensic” does not refer to a type of technology or the products of a specific vendor – despite claims and propaganda to the contrary.  Forensic refers to the methodology used by the person collecting the evidence – whether it is finger prints from a weapon or electronic files from an employee’s laptop.  Forensic imaging, however, refers to the process by which an entire hard disk is copied bit by bit to create an exact duplicate of that hard drive in a forensic manner.  It is entirely possible for a collection of ESI to be “forensically sound” by simply employing the technique described above of taking hash values at each stage of the process to be able to prove that the files were not altered during collection.  As long as chain of custody is also maintained (much easier to do now that we are not using multiple tools, vendors, locations, and people to do the job), then the process should meet the threshold admissibility requirements of the Federal Rules of Evidence.

Opponents will still bring up claims that the evidence must have been altered, or the expert familiar only with forensic imaging technologies will try to use the argument that only vendor X’s technology is “court vetted,” so any other method is not acceptable.  But, to these opponents, I would argue two points:

  1. No technology is “court vetted”.  The operator’s use of the technology in the specific case (in a specific jurisdiction) was acceptable to the court to meet the threshold showings required by FRE 901, 1001, and 1002 – as well as any rules of procedure governing the production of discovery in either a civil or criminal matter.  Wow – that would be a very long footnote on a marketing slide…probably why it is not usually mentioned.
  2. The process is forensically sound, and you can prove that the documents were not altered from collection through production by referencing the hash value and maintaining copies of the original native files analyzed on a secured preservation store.  This would exceed the requirements of FRE 901, 1001, and 1002 – but would provide protection against claims going to the “weight” of the evidence by opponents who would cry foul.

What Now?

So, where does all of this leave us?  First, in the vast majority of civil litigation matters where electronic discovery is being performed, forensic bit by bit imaging of computer hard drives is simply not required.  Vendors have promoted this practice over the years, but all this has done is over-complicate the eDiscovery process for many unsuspecting litigants and dramatically increase costs because the model simply does not scale.  Moreover, the effort and cost required to deal with these full drive images downstream in the process is often overlooked by these vendors and overzealous consultants.  Next, we now know there is a better way – targeted, forensically-sound collection of ESI using streamlined and automated solutions that maintain custodian relationship – even for shared data sources – throughout the eDiscovery lifecycle, preventing form of production disputes and other calamities that have plagued this industry for the last decade.  There is a better way to collect ESI that will provide exponential cost savings all the way to production.

Judge Scheindlin Decides that the Metadata is “Integral” in FOIA Case: Fmr. Judge Ron Hedges Weighs In

Monday, February 28th, 2011

Just as when Judge Scheindlin penned Pension Committee, her latest opinion is already garnering a ton of buzz.  In Nat. Day Laborer Org. Network v. United States Immigration and Customs Enforcement Agency (“NDLON”), 2011 WL 381625 (S.D.N.Y. Feb. 7, 2011) Judge Scheindlin boldly takes on four governmental agencies (ICE, the Department of Homeland Security, the Federal Bureau of Investigation, and the Office of Legal Counsel) over metadata production in response to FOIA demands.

In NDLON Plaintiffs submitted identical twenty-one page FOIA requests to each of the four defendant agencies.  And, after some initial missed deadlines and judicial intervention, Plaintiffs sent the defendants a proposed protocol that requested a specific format for the production of electronic records.  Significantly, the proposed protocol was based on the “format demands routinely made by two government entities-the Securities and Exchange Commission and the Department of Justice Criminal Division” (invoking the old “good for the goose” argument).

Before ruling on the protocol, Judge Scheindlin examined the parties’ efforts to cooperate and she was uniformly underwhelmed:

“As far as I can tell from the record submitted by the parties, the equivalent of a Rule 26(f) conference, at which the parties are required to discuss form of production, was not held and no agreement regarding form of production was ever reached. Nor was a dispute regarding form of production brought to the Court for resolution.”

In evaluating controlling law, the fact that “[n]o federal court has yet recognized that metadata is part of a public record as defined in FOIA” didn’t stop Judge Scheindlin from looking to both state law and the FRCP for guidance.  Next, she relied on Aguilar, which noted that the Sedona Conference abandoned an earlier presumption against the production of metadata in recognition of “‘the need to produce reasonably accessible metadata that will enable the receiving party to have the same ability to access, search, and display the information as the producing party ….’”  She then foreshadowed her subsequent ruling by concluding: “[b]y now, it is well accepted, if not indisputable, that metadata is generally considered to be an integral part of an electronic record.”

The Government, not surprisingly didn’t go down without a fight, arguing that “metadata is substantive information that must be explicitly requested and then reviewed by an agency for possible exemptions.”  In concert they also claimed that “if the requirements of FOIA and the requirements of the Rules conflict, FOIA must trump the Rules.”  Judge Scheindlin wasn’t persuaded, holding that:

“[T]here is no need to decide this question because FOIA does not conflict with the Rules. FOIA is silent with respect to form of production, requiring only that the record be provided in ‘any form or format requested by the person if the record is readily reproducible by the agency in that form or format.’… Defendants’ productions to date have failed to comply with Rule 34or with FOIA.”

In terms of the remedy for the government’s failure, she did cut them some slack:  “Because no metadata was specifically requested in Plaintiffs’ July 23 e-mail, and because this is an issue of first impression, I will not require Defendants to re-produce all of the records with metadata.”  But for future productions she held that the bulk of the ESI be produced in “TIFF image format but with corresponding load files, Bates stamping, and the preservation of “parent-child” relationships (i.e. the association between an attachment and its parent record)” citing the metadata list below for non-email files.

  1. Identifier
  2. File Name
  3. Custodian
  4. Source Device
  5. Source Path
  6. Production Path
  7. Modified Date
  8. Modified Time
  9. Time Offset Value

So, here’s the rub.  The legal populous, not surprisingly, likes bright line rules.  So, when Judge Scheindlin writes (in Footnote 41):  “[w]hile not necessary to the holding in this case, I believe that these are the minimum fields of metadata that should accompany any production of a significant collection of ESI” it’s easy to see how the above nine fields may become a blunt instrument wielded haphazardly by requesting parties.   Not surprisingly, Judge Scheindlin is aware of her mantle and further tries to caveat her holding (in footnote 44):

“To be clear, my Order requiring the use of this Proposed Protocol for future productions-as amended by the specific metadata fields I have required and by the options I have offered the parties regarding the form of production for spreadsheets-is limited to this case. I am certainly not suggesting that the Proposed Protocol should be used as a standard production protocol in all cases. The production of individual static images on a small scale, where no automated review platform is likely to be used, may be perfectly reasonable depending on the scope and nature of the litigation.

The impact of footnote 44 was top of mind when I recently spoke to Fmr. Judge Ron Hedges who chimed in:

“Attorneys must confer with regard to production requirements, as they should before bringing any dispute before a federal court. Moreover, attorneys should recognize that, as Judge Scheindlin said in footnote 44, that the selection of metadata fields to request are case-dependent.  Any attempt to arrive at a ‘universal’ or ‘bright line’ standard for production of metadata ignores the text of Rule 34(b) and the bargaining that occurs in meets-and-confers, and the unique aspects of individual civil actions.”

Despite agreeing with Judge Hedges’ sentiment, the main question in my mind will be whether footnote 44 is given its due weight going forward.  My concern is that, as is oft discussed with her Pension Committee decision, parties may hone in on the bright line test and miss the nuances.  While it’s easy to argue against the folly of this thinking, it may not stop it from happening in the near term.

Finally, in another shout out to the Cooperation Proclamation, Judge Scheindlin takes a swipe at counsel, who forced her to rule on an “e-discovery issue that could have been avoided had the parties had the good sense to ‘meet and confer,’ ‘cooperate’ and generally make every effort to ‘communicate’ as to the form in which ESI would be produced.”

“The quoted words are found in opinion after opinion and yet lawyers fail to take the necessary steps to fulfill their obligations to each other and to the court. While certainly not rising to the level of a breach of an ethical obligation, such conduct certainly shows that all lawyers-even highly respected private lawyers, Government lawyers, and professors of law-need to make greater efforts to comply with the expectations that courts now demand of counsel with respect to expensive and time-consuming document production. Lawyers are all too ready to point the finger at the courts and the Rules for increasing the expense of litigation, but that expense could be greatly diminished if lawyers met their own obligations to ensure that document production is handled as expeditiously and inexpensively as possible. This can only be achieved through cooperation and communication.”

In the end, NDLON will continue to generate a ton of discussion (as did Zubulake and Pension Committee).  While this decision won’t single-handedly end the metadata discussion it will hopefully serve as a launching point for more clarity down the road.  For this, practitioners on both sides of the debate should be thankful.

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

Moody v. Turner: An E-Discovery Battle with No Winners

Friday, December 3rd, 2010

The electronic discovery blogosphere is filled with analysis of the recent opinion by Judge Sandra Beckwith of U.S. District Court for the Southern District of Ohio, on the Moody v. Turner case. What is striking about the case is that it reveals a huge gap in understanding the pitfalls of prolonged discovery disputes in the context of attempts by thought leaders to prevent exactly the issues elicited in this opinion. As the excellent post by Ralph Losey indicates, in this case, it is an affront to have this play out in front of Judge Beckwith, a signatory to The Sedona Conference Cooperation Proclamation.

In reviewing the facts of the case, here are highlights on some of the process missteps:

Lack of Early Data Analysis

It is not obvious to some how important it is to perform an early analysis of the data before agreeing to search  ESI for a certain number of custodians and apply certain keywords. This case illustrates three reasons why early data analysis is critically important .

First, the producing party must identify and communicate the right list of custodians. If there is any change or expansion of scope, that needs to be communicated as well. In this case, the Defense team, at their pre-trial 26(f) conference with the Plaintiffs, agreed to produce ESI for twenty six custodians, but chose to send Preservation Notices to larger number of individuals.  While this act by itself is commendable, the lack of prompt communication to the Plaintiffs is certainly a misstep that the Plaintiff chose to latch on to as incomplete production of ESI.

Second, the producing party must have a handle on scope of searches before committing to “run them”.  In reviewing the document Case: 1:07-cv-00692-SSB Doc #: 43, Exhibit 7, it is apparent that the twenty production requests in that report are not trivial. An early analysis of both the data as well as searches at least on a small sample would have helped the producing party understand the scope and challenges of running those searches.

Third, the producing party must evaluate their collection, search, and production methods to evaluate the feasibility of producing metadata. As evidenced in the Plaintiffs’ motion (Doc-89, Page 19), it is clear that the Defense did not produce TIF images along with searchable text. However as noted in Doc-118, Page 18, footnote 10):

“In any event, parties are generally not required to produce the metadata of their data sets. See Wyeth v. Impax Labs., Inc., No. 06-222, 2006 WL 3091331 at *2… Turner has produced all ESI in TIFF format, except for Excel spreadsheets which were produced in native format given the substantial size of many of the spreadsheets (which, if in TIFF format, may print across hundreds of pages). Judge Hogan therefore rightfully declined to compel Turner to produce any additional metadata.”

This is a fairly common request and  one that the Plaintiffs could have placed in their pre-trial 26(f) conference.

Out of Control Production Requests

In reviewing the aforementioned court document, Doc #: 43, Exhibit 7, one can glean a wealth of information on the nature of searches requested by the Plaintiffs and the responses by the Defense team. The immediate problem evident in these requests is an issue raised by the Defense team – that the search requests are overly broad. Some of the search terms are “plan”, “method”, “rate” and “account”, which are certain to hit a very large number of documents. See below for one of the requests.

Production 1-Item 2: All documents other than emails that can be electronically or digitally searched as containing one or more terms that concern the Plan in any way or cash balance pension plans and contain the word “accrual,” “benefit,”, “benefit accrual,” “accrual of benefit”, “accrual methods,” … “calculate”, “calculation”.

This goes on and on, for about eighteen pages. Combined, the twenty production requests would clearly hit almost every collected document (a total of 118GB of documents), thus making a follow-on privilege or confidentiality review prohibitively expensive. It is the lack of specificity in these searches that makes the discovery request overly broad. On the other hand, the response from Defense appears to be also poorly constructed. In their response, what we see is the same boiler-plate text, which didn’t escape the notice of the Plaintiffs and the court.

“Defendants object to this Request because it is overly broad, unduly burdensome, seeks documents that are neither relevant nor likely to lead to the discovery of admissible documents and (because Plaintiffs define “documents” to include electronic or computerized data compilations) seeks electronic documents that are not reasonably accessible due to undue burden and/or cost. Defendants further object to this Request because it implicates documents protected by the attorney-client and/or work-product privilege and any such documents will be withheld from production”

What would have helped the Defense’s case would be actual data supporting their claims. For example, if the defendants were to tabulate that words such as “plan” and “benefit” and provide actual document and/or hit counts, it would have bolstered their claim. As expected, this caused the Plaintiffs to submit a further filing, Doc-89 with a host of complaints, chief among them:

Defendants reported only (1) the total number of unique documents captured by the search of 17 terms and (2) the number of documents that contained the term “cash balance” but none of the Plaintiffs’ other terms. See Doc. 77-10 at 2.

Furthermore, the Plaintiffs appear to be on the right track, recommending:

On October 14, Plaintiffs wrote to Defendants and proposed an “iterative search process” to decide on a final set of search terms.

It seems clear in the on-going discovery disputes, an iterative search process was perceived as contrary to zealous advocacy of their client’s positions and not as a path to resolving further disputes, much as the Cooperation Proclamation suggests. In this context, engaging in a search expert is essential – someone who can modify the search to include more restrictive criteria to limit your search results. Why bother running an open-ended search and produce 29.4GB of useless junk, when you can combine these terms with Boolean, proximity, and other searches? The types of searches, and what each can offer, is a topic that the members of EDRM tackled in formulating their EDRM Search Guide, which is a must-read for anyone attempting to construct e-discovery searches.

Proportionality Arguments Without Strong Basis

An important point to note is that any discovery request that uses inefficient processes and inappropriate technologies will certainly result in undue burdens and cost.  It appears that the Defense team did not offer proper cost estimates (arguments put forth in Doc-77-10 notwithstanding), and just pushed an undue burden/cost argument with the hope that the courts would absolve them of discovery obligations. At the same time, the Plaintiffs did seem to have over-reached a bit on extending their discovery disputes with the hope of reaching a favorable outcome. Two examples of such attempts are:

  1. Upon Defense producing the documents (Doc-118),

Turner has produced every responsive, non-privileged document obtained through the email ESI searches that related to the Plan; these comprise 4.1 GB, or more than 40,708 pages of documents.

The Plaintiffs counter with:

“Plaintiffs maintain that Turner should be compelled to produce the metadata for the email ESI it has produced because otherwise they allegedly “cannot know whether Defendants have searched all 33 custodians’ email files” and “cannot confirm whether any email files were electronic in origin (rather than printouts of emails) or determine whose files they came from.”

As noted earlier, request for metadata and the feasibility of producing it must be negotiated specifically in the 26(f) conference.

  1. The attempt of the Plaintiffs to expand discovery, to compel any and every third party, including Defense’s former law firms, as well as inspect “shared network drives”, “non-shared drives” etc.

“Judge Hogan recognized that Turner should not be compelled to probe through the recesses of its internal electronic systems for even more ESI on top of the 47,000-plus hard copy documents and the 40,000-plus pages of ESI it has produced – because those additional searches are not likely to lead to the discovery of any evidence relevant to plaintiffs’ claims. Judge Hogan was presented with the gory history of Turner’s efforts to search through “shared network drives” and “non-shared drives,” emails and backups. He found these efforts to be sufficient, and rightly rejected plaintiffs’ demand for additional ESI.”

One can see that Plaintiff’s attempt to drag the electronic discovery efforts into an endless battle was counterproductive.

Final Takeaway

The Sedona Conference Cooperation Proclamation rightfully recommends “Jointly developing automated search and retrieval methodologies to cull relevant information”. As costs for getting to the facts escalate, a comprehensive strategy that uses the best processes, the best technology, and a commitment to the Cooperation Proclamation is essential for the legal system to deliver what people expect – justice based on facts. Gamesmanship as evidenced in Moody v. Turner is detrimental to this cause.

Fulbright Litigation Survey Calls Out Need for More Proportionality/Rules Changes

Thursday, November 11th, 2010

Fulbright & Jaworski recently issued its “7th Annual Litigation Trends Survey Report” and there were several interesting trends worth noting.   Not surprisingly, the general pace of litigation is forecast to increase upwards, relatively unabated, with more than 25% of respondents expecting their companies’ disputes to increase in the next 12 months.

Beyond this trend it’s clear that there’s also groundswell of support for a movement towards more e-discovery proportionality.  While also a big topic at Sedona’s annual conference (and discussed in the recent Moody case), a whopping 79% of US respondents think the “US Rules of Civil Procedure should be modified in some way to limit e-discovery in civil cases.”  While I haven’t heard of any specific proposals for a rules amendment, it’s clear that folks aren’t happy with the status quo, particularly with the increasing discovery burden facing enterprises dealing with unilateral disputes.   This discontent is likely tied to the fact that costs continue to escalate, with the survey indicating that more than 40% of the largest US companies (over $1B in Revenue) plan to “increase their spending on e-discovery in the next 12 months.”

Finally, the survey also focused on an area that’s getting an increasing level of scrutiny.   Fulbright asked “when preserving potentially relevant information in litigation or an investigation, what methods do you use most frequently for preserving electronically stored information?”  Leading the pack, with 55% of vote, was “rely on individual custodians to identify and preserve their own information.”  Custodian based collections have been discussed recently as being under fire in blogs and other recent cases such as Pension Committee and Ford Motor Co. v. Edgewood Properties Inc. The notion is that under- or un-supervised collection methodologies are dangerous because it’s relatively easy to paint the custodians at issue as either being motivated to hide responsive data or relatively unconcerned with compliance.  Nevertheless, it’s clear that (as of now) custodian-based collections are still somewhat “reasonable” given that more than 50% of the populous collects data this way.

On the other side of the spectrum from custodian based ESI collections, there are automated data collection tools and methods that can be considered too.  There are undoubtedly advantages (risk reduction, speed, audit trails, etc.)  to using “automated search software” for the collection of data (like 43% of the respondents did in the Fulbright survey).  Yet, it’s clear this isn’t a zero sum game – meaning there’s currently a place for both methodologies in the legal landscape.  For many organizations it becomes a risk management exercise as summarized in a recent  ARMA article entitled “Is ‘Manual’ Collection of ESI Defensible?”: “Companies may choose the manual collection of ESI to reduce costs, particularly if they have limited levels of litigation or lower risk levels posed by the litigation itself.”

In the end, like so many aspects of electronic discovery, almost any well thought out, well documented methodology *can* be defensible, but the onus is on the preserving/collecting party to buttress whatever poison they pick.  Defaulting into a method without preparation, auditing and follow-through is a recipe for disaster.

Manual Collections of ESI in Electronic Discovery Come under Fire

Monday, May 17th, 2010

Jason R. Baron was a keynote speaker at a recent electronic discovery summit and he mentioned an electronic data discovery topic that “ought to be blogged about.”  So, with that kind of softball I had to take a swing, particularly because it’s been a topic we (at e-discovery 2.0) have been discussing lately.

The genesis of this blog (per Jason) is the recent “skepticism” evidenced by the bench regarding the defensibility of custodian based collections.  ARMA has a good piece on this very topic, entitled “Is ‘Manual’ Collection of ESI Defensible?”  The core notion is that the tried and true practice of custodian based ESI collection is now under fire by courts, which appear to be looking at this practice with an increasing level of distrust.

“While it is common for companies to use automated data-collection software and hardware, some corporate litigants opt for more informal, “manual” collection methods (i.e., searches performed by individual records custodians) when responding to ESI requests. Companies may choose the manual collection of ESI to reduce costs, particularly if they have limited levels of litigation or lower risk levels posed by the litigation itself.”

While there’s no dispute that the “automated” collection methods available in litigation software referenced above have a number of features that make this approach more efficient, the question is whether a “manual” (i.e., custodian based) collection process is somehow less defensible.  If this is truly the case, then many midsized companies without the budget to purchase such e-discovery applications will inherently be found deficient – which is a daunting notion.

Take the recent case of Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) where the dispute arose out of the demolition of a Ford assembly plant in New Jersey.  Ford and Edgewood entered into a contract whereby Ford agreed to provide 50,000 cubic yards of concrete to Edgewood in exchange for Edgewood removing it from the site.  When the concrete turned out to be contaminated, the dispute started in earnest.

The crux of Edgewood’s complaint was that it was unhappy with Ford’s production and somehow suspected that the dearth of documents was due to the electronic data collection process.  Edgewood sought to “’confirm the adequacy of Ford’s manual document collection process’ by using a third-party vendor to perform keyword searches on documents not in the existing repository of ESI, but instead, documents within the possession of certain Ford custodians.”

To reconcile the dispute the court looked to the Sedona Conference’s work in the area:

“In The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, Practice Point 1 states that “[i]n many settings involving electronically stored information, reliance solely on a manual search process for the purpose of finding responsive documents may be infeasible or unwarranted. In such cases, the use of automated search methods should be viewed as reasonable, valuable, and even necessary.”(emphasis added). Once again, the Court confronts this peculiar situation insofar as Edgewood has a point that the document collection method used by Ford is not necessarily contemplated under the Sedona Principles, but that agreement by the parties at the outset as to the mode of collection would have been the proper and efficacious course of action.  However, “[a]bsen[t] agreement, a [responding] party has the presumption, under Sedona Principle 6, that it is in the best position to choose an appropriate method of searching and culling data.”

Accordingly, the court found that the lack of agreement coupled with Ford being in the best position to make a call about the methodology, was a deciding factor in generally upholding Ford’s manual collection process.

“It would be improvident at this juncture to grant Edgewood the relief it seeks when it has not shown any indicia of bad faith on the part of Ford. To countenance such a holding would unreasonably put the shoe on the other foot and require a producing party to go to herculean and costly lengths (especially in a document-heavy case such as this) in the face of mere accusation to rebut a claim of withholding. This scenario is not contemplated by the Federal Rules.”

While Ford wasn’t penalized for its manual collection, this practice has come under fire in several other opinions.  In the highly controversial case of Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) custodian based collection/preservation policies were similarly under fire.

“ASUS’ practices invite the abuse of rights of others, because the practices tend toward loss of data. The practices place operations-level employees in the position of deciding what information is relevant to the enterprise and its data retention needs. ASUS alone bears responsibility for the absence of evidence it would be expected to possess. While Adams has not shown ASUS mounted a destructive effort aimed at evidence affecting Adams or at evidence of ASUS’ wrongful use of intellectual property, it is clear that ASUS’ lack of a retention policy and irresponsible data retention practices are responsible for the loss of significant data.”

Adams was in fact cited by Judge Scheindlin in her latest opus Pension Comm. of the Univ. of Montreal Pension Plan v. Banc of America Sec. LLC, No. 05 Civ. 9016, 2010 U.S. Dist. Lexis 4546, at *1 (S.D.N.Y. Jan. 15, 2010), where she found fault with the Plaintiff’s reliance on manual collections:

“This instruction does not meet the standard for a litigation hold. It does not direct employees to preserve all relevant records–both paper and electronic-nor does it create a mechanism for collecting the preserved records so that they can be searched by someone other than the employee.  Rather, the directive places total reliance on the employee to search and select what that employee believed to be responsive records without any supervision from Counsel.

From the foregoing, it’s probably too early to call the skepticism over manual collection a trend per se.  Certainly, lobbing a preservation notice over the proverbial wall to custodians without the requisite level of supervision is a recipe for disaster.  Education (about the matter and the required tasks), compliance (with the preservation instructions) and ongoing monitoring (to ensure that compliance continues over time) are all critical responsibilities that must be thoughtfully undertaken by counsel for a defensible ediscovery process.

The question then becomes, is the problem here really about the “manual” collection efforts by the custodians or more simply the fact that they aren’t supervised with the requisite degree of care?  If this is the case, which I’d opine that it is, then “properly executed” manual collections should be fine (i.e., defensible).

But, as Ford indicates, if your company is going to rely upon a manual collection modus operandi, then it may be advisable to let the opposition in on the use of this tactic.  This approach may be mandated by local rule or it may just be the type of transparent cooperation that’s all the rage these days.

Learn More On Litigation Support Software & Electronic Discovery Litigation

New York State Court Issues Report Calling for Extreme E-Discovery Makeover

Wednesday, April 28th, 2010

The New York state court looked in the mirror recently and they didn’t like what they saw.  While it’s hard to imagine the self-dubbed “center of the universe” finding flaws with anything… apparently e-discovery has caused the big apple to take serious stock of the situation.  In a report entitled ELECTRONIC DISCOVERY in the NEW YORK STATE COURTS, Chief Judge Jonathan Lippman and Chief Administrative Judge Ann Pfau do an excellent job laying out the nature of the problem in a 24 page report.  Their initial findings in many ways mirror those of the American College of Trial Lawyers Task Force on Discovery (”Task Force”) and their survey of the Fellows of the American College of Trial Lawyers (”ACTL”).

“Electronic discovery (“e-discovery”) has for some time been changing the face of modern litigation. It is a major, if not the predominant, factor behind rising litigation costs and delays and presents serious challenges to the court system’s ability to resolve disputes ranging from commercial matters to personal injury cases, in an efficient, cost-effective manner.”

Fortunately, the Report recognizes the ubiquity of the vexing e-discovery challenges.

“[T]he volume of electronically stored information (“ESI”) has increased exponentially over the last decade, along with the amount of ESI potentially relevant to legal disputes. But while it is inexpensive to store immense quantities of ESI, it can be extremely expensive in the context of litigation to identify, preserve, and collect potentially relevant ESI and to have it reviewed for responsiveness and privilege by attorneys and paralegals prior to production to another party.”

But surprisingly, they’ve taken their shortcomings personally, and the seriousness apparently threatens New York’s standing in the legal community.

“Interviews with leading judges, law clerks, and practicing lawyers from around the state strongly suggest that the New York court system’s standing as a leading forum of both national and international litigation is at stake. … Those same parties and lawyers appear to be turning away from New York State courts for the greater sense of certainty and ability to handle massive e-discovery disputes that the Federal courts, and to a lesser extent, other state courts with more developed e-discovery practices, can provide.”

The report founded upon “extensive research and interviews with experts in electronic discovery”, addresses the problems of electronic discovery, including cost and delay, and provides several recommendations on how “the courts can manage e-discovery in a more expert, efficient and cost-effective manner within the framework of existing law.”

1. Establish an E-Discovery Working Group

This proposed step is one of the more interesting since the goal is to create “a working group of e-discovery experts that would serve as a resource for the court system and support its efforts to improve the management of e-discovery.”  This Working Group would have a very expansive (perhaps too much so) roster:

  • Judges, court attorneys, and court clerks drawn from both the Commercial Division and other courts around the state that handle electronic discovery issues (and perhaps one or more judges/court personnel with little or no e-discovery experience);
  • Lawyers with extensive experience litigating cases involving large volumes of ESI;
  • One or more CPLR Advisory Committee members with an electronic discovery background;
  • Medical malpractice, matrimonial, criminal, mass tort, and employment law practitioners, because of the increasing frequency and importance of electronic discovery in these practice areas;
  • General counsel familiar with the issues affecting corporate clients who are heavy-ESI producers, particularly in the financial services and health care industries;
  • Forensic computer/e-discovery specialists who typically are hired for large electronic discovery productions, but can share their substantive technical knowledge and familiarity with the latest technological/forensic trends;
  • A mix of newer and more experienced practitioners, including one or two more experienced practitioners with limited technical proficiency;
  • Bar association representatives who have studied and issued reports on electronic discovery;
  • Federal practitioners and/or federal magistrates to offer the federal courts’ perspective;
  • An academic who has studied and written about electronic discovery;
  • Representatives of the Advisory Group to the New York State and Federal Judicial Council, which works to promote awareness about differences and commonalities in law practice between the state and federal judiciaries;
  • A member of The Sedona Conference®, a national group of jurists, lawyers, experts and academics considered to be at the cutting edge of electronic discovery issues;
  • Representatives of the Attorney General’s and/or District Attorneys’ Offices who are familiar with how electronic discovery is affecting their caseloads.

Assuming they can put together this dream team, the next challenge (beyond finding times to meet) would be to harmonize all the differing perspective, which certainly won’t be easy.

2. Improve the Preliminary Conference

The Preliminary conference was roundly felt to have value, but there were both short term and long term recommendations for change.  In the near term, the Report concludes that new language should be added to Commercial Division Uniform Rule 1 and to Rule 202.12(c)(3) adding in a new language stating that:

“Counsel appearing at the PC should be sufficiently versed in matters relating to their client’s technological systems to competently discuss with the court and opposing counsel all issues relating to e-discovery. Counsel may, in appropriate cases, supplement their ability to address these issues at the PC by bringing a client representative or outside expert with such knowledge.”

Assuming the short term fixes don’t remediate things completely, the Report recommends two additional steps, each to be piloted.  First, one pilot project should require an Initial Disclosure (similar FRCP Procedure 26[a][1]) for all parties relating to electronic discovery issues, which would require the parties to detail the following, in advance of the PC:

• Who the party’s key IT people are;

• Whether, and to what extent, the party has implemented preservation measures to avoid spoliation of the information relevant to this case;

• Which substantive witnesses the party is likely to call who are likely to possess ESI, and the location of that ESI (e.g., laptops, wireless handheld devices);

• What types of computer systems (including e-mail, word processing and spreadsheet software) and other technologies the party uses that may have created documents relevant to the litigation; and

• Whether the party expects to claim that certain ESI relevant to the case is inaccessible due to the form in which it is maintained (e.g., disaster recovery backup tapes, legacy data).

The other pilot program would require an “Affirmation of E-Discovery Compliance” that would be jointly signed and certified by the lawyers for each party, and provide the court with three lists.

“The first list would contain those e-discovery matters, contained in Rule 8(b) or Rule 202.12(c)(3), which the parties were able to meet-and-confer about and resolve. The second list would contain similar matters that, despite meeting and conferring, the parties could not agree upon or resolve and that need the court’s involvement. The third list would be any additional issues that, because of the disagreements described in the second list, the parties could not yet reach and resolve. The document would also chronicle the parties’ attempts to meet-and-confer, and indicate whether, and to what extent, client personnel and IT specialists were involved.

While there are a few other minor suggestions, one of the most interesting is the shout out to the The Sedona Conference®.  The Report concludes that “judges and practitioners applauded the work of The Sedona Conference®, particularly its emphasis on changing the litigation culture and fostering dialogue, cooperation, and transparency in e-discovery.”  The Report recommends an appointment of a representative to The Sedona Conference® which despite the foregoing “should not be interpreted to mean that the court system necessarily endorses that organization’s work and proposals. Rather, the court system’s appointee would bring back materials for consideration here in New York, to be accepted, rejected, or modified, as appropriate.”

All in all, the New York state court appears to have taken a reasoned and measured approach to address their candid shortcomings.  This type of critical analysis should be taken by more jurisdictions to determine where process gaps still exist.  Only then can a better future state be divined.

Learn More On Litigation Support Software & Electronic Discovery Litigation