Is there a place in eDiscovery today for hard drive imaging and bit by bit copies, which collect deleted items or slack/unused hard disk space? The answer is yes with some important limitations. For the vast majority of matters, ESI can be collected without imaging drives or utilizing proprietary container files. However, I occasionally still encounter folks who are victims of the dated and costly misconception that eDiscovery always requires the bit-level imaging of hard drives.
There are situations, though, where the existence of data (as opposed to its content) is central to the matter – when companies suspect employees of stealing proprietary information or when employees leave a company under suspicious circumstances. In these and other similar situations, it may make sense to have the employee’s workstation hard drive imaged for full forensic analysis. Even in these scenarios, I find that companies are more likely to hire an external investigator to perform this task to allay suspicions of tampering or bias, and the company generally would prefer that this investigator be the one to testify about this sensitive data acquisition. Then, for ESI beyond the target employee’s hard drive, other collection methods may be used. As we’re now midway through 2011 – a year in which I expect to see eDiscovery fully embraced by many corporations as a true business process – I wanted to analyze why the forensic disk image myth still exists, where it came from, and what the law really requires of an eDiscovery collections process.
Traditionally, cases that mentioned full forensic imaging of hard drives began their captions with United States v. or State v. because they were criminal matters. In traditional civil litigation – even the behemoth eDiscovery cases that get all the bloggers blogging – forensic imaging simply is not required or needed. In fact, in most cases, it will dramatically increase the cost associated with electronic discovery – this process adds unnecessary complexity in downstream phases of eDiscovery and leads to vast over-collection. Why collect the Microsoft Office suite 50 times when what you are really required to preserve and collect are the files created with those programs? When using disk imaging, program files are collected which drives up storage costs and requires the post-collection step of deNISTing (removing system files based on the NIST list). Why not leave those system files behind and perform a targeted collection of only user-created content? In addition, the primary rules governing civil litigation – the Federal Rules of Civil Procedure and Federal Rules of Evidence – simply do not require exact duplication of electronic files. I am amazed that there are so many experts who are still pushing full forensic imaging and duplication in every case. In fact, this goes against best practices published by The Sedona Conference, EDRM, and in the E-Discovery textbook co-authored by Judge Shira A. Sheindlin.
In comment 8c of the Sedona Principles, the authors call making forensic image backups of computers “the first step of an expensive, complex, and difficult process of data analysis that can divert litigation into side issues and satellite disputes involving the interpretation of potentially ambiguous forensic evidence.” The comment goes on to say that “it should not be required unless exceptional circumstances warrant the extraordinary cost and burden.” In a whitepaper authored for EDRM by three eDiscovery experts from KPMG, LLC, the authors discussed the high cost of forensic bit-level imaging and, instead, suggested that targeted collection of ESI would be sufficient in the vast majority of non-criminal matters. They state, “[t]he challenge of Smart EDM [Evidence and Discovery Management] is to obtain targeted files in a forensically sound manner – chain-of-custody established, proven provenance, and metadata intact – without having to resort to drive imaging.”
In Electronic Discovery and Digital Evidence: Cases and Materials, written by Judge Shira A. Scheindlin, Daniel J. Capra, and The Sedona Conference, the authors state that,
“because imaging software is commonly available, and because the vast majority of training programs in the field of electronic discovery revolve around forensics, there is a growing tendency to want to ‘image everything.’ But unless an argument can be made that the matter at hand will benefit from a forensic collection and additional examination, there is no reason to do a forensic collection just because the technology exists to do it.”
So, with the top experts in the field saying the days of “image everything” should be over, why does it still happen? Why are the victims of this antiquated workflow still paying the exorbitant costs of a solution that does not really meet their requirements? Perhaps a historical perspective will be helpful in explaining.
Why Drive Imaging and Proprietary Containers?
I do not think there is any debate on the benefit of having a bit-level image of a hard drive in a criminal investigation. However, traditionally, the investigators using these methods needed a way to get the imaged drive safely back to a lab for further analysis. Companies or law enforcement agencies that hired third-party investigators to image drives had to transport the data, maintaining chain of custody, and preserving all contents in an un-alterable state through several phases of the investigation. And, in criminal matters, it was especially important to maintain the integrity of the evidence when the electronic evidence was central to the government’s case. Remember, the burden of proof in a criminal matter is “beyond a reasonable doubt” (along with a host of constitutional considerations). Alteration of key evidence could certainly create reasonable doubt and hose the prosecution’s case (or, worse, the evidence gets tossed by the Court before the trial even begins). The container file ensures that no matter who handles the evidence, checksums can prove that the contents were not altered since the initial imaging.
Many vendors now offer logical image containers as an alternative to doing a full bit-level image of the drive. However, in corporate eDiscovery, this is still overkill because the tools and solutions being used downstream still have to unpack or parse these proprietary container formats for processing and analysis. In fact, even software from the vendors who created these container formats must “crack them open” to get to the contents within. This seems to add a layer of complexity that has not been needed since the days of the external examiner coming in with her forensic toolkit to do drive images. The format was created to solve a very specific problem, and little thought was given to the use of this format in a holistic process like what is typically seen in civil eDiscovery. There is no longer a need for a container for portability of evidence because it is most likely going to be processed in place after collection while residing on a secure evidence store on the company’s network. I have heard “what if our collections methods are challenged?” And to that, I would respond that we are not in criminal court and that the requirement in civil court is reasonableness, not perfection. Now, if an employee is suspected of wrongdoing and the potential deletion of files will dramatically alter the case, then by all means, hire a forensic investigator and follow all of the protocols established over the last several decades in computer forensic science.
Fast forward to the 21st century
Corporations are bringing eDiscovery in-house; they are building a business process around it to minimize risk and drive enormous cost savings, and in today’s world of civil litigation, there simply is not a need for these drive images or proprietary containers. First of all, the burden of proof in a civil matter is “by a preponderance of the evidence.” What this means is that the burden is satisfied if there is greater than 50% chance that a proposition is true. This is a much lower standard than in criminal cases. But, burden of proof goes more to the weight evidence is given by the court or jury. Before that is even considered, evidence must pass several hurdles of admissibility. As we will explore, these standards of admissibility have also been the recipients of significant bolstering from vendors over the years.
The Path to Admissibility
There are several hurdles to admissibility for any type of evidence, and because they are not within the scope of this post, I will forego any discussion of relevance, FRE 403, or the hearsay rules. I will focus on the issues that tend to be associated with electronic evidence: authentication and the “best evidence rule”. There are some examiners and perhaps even vendors that would argue electronic evidence is simply not admissible if not collected using bit-level imaging (and sometimes 2 copies – one that is referred to by examiners as the “best evidence” copy and another “working copy” to be analyzed). This is simply not true. What we will find is that the collection method will go more to the weight of the evidence rather than the minimum showing needed for admissibility (hence, the discussion of burden of proof above).
All evidence must be authenticated pursuant to FRE 901. This is a “don’t pass Go” threshold requirement for admissibility. FRE 901 is satisfied by “evidence sufficient to support a finding that the matter in question is what its proponent claims.” Notwithstanding a “self-authenticating” piece of evidence pursuant to FRE 902, the proponent must establish the identity of the exhibit by stipulation, circumstantial evidence, or the testimony of a witness with knowledge of its identity and authorship. Typically, objections to this process would tend to go toward whether the exhibit is an original, was altered, or the witness with whom the proponent is attempting to authenticate the exhibit is not able to so based on lack of personal knowledge or some other defect. Mostly these objections deal with the authenticity of the contents of the exhibit, and the rules in Article X of the FRE are helpful here. Rule 1001 defines an “original” with respect to data stored in a computer or similar device as “any printout or other output readable by sight, shown to reflect the data accurately.” This is a far cry from a bit-by-bit forensic image! Rule 1002 – often referred to as the “Best Evidence Rule” – requires that “[t]o prove the content of a writing, recording, or photograph, the original writing, recording, or photograph is required, except as otherwise provided in these rules or by Act of Congress.” Not only do these rules not require exact duplication of the electronic files, but they do not require imaging the entire 80GB hard drive to collect the 100MB of files that are potentially relevant to the case. What they do require, though, is the ability to show that a document being proffered is the same document that was originally created. In Re Vee Vinhnee, 336 B.R. 437, 444 (B.A.P. 9th 2005). Also, Judge Grimm sets out an extremely comprehensive analysis of what is required for the admissibility of electronic evidence in civil litigation in Lorraine v. Markel American Insurance Company, 241 F.R.D. 534 (D.Md. May 4, 2007). In Lorraine, he notes that In Re Vee Vinhee may set out the most demanding test for admissibility of ESI.
Maintaining Forensic Integrity
So, how do I combat the claims that “they must have altered that document” or “Your, honor, I swear that line about ‘acceptable losses’ was not in the safety memo when I created it”? This is where hash value becomes a wonderful thing. Computing the hash of an electronic file, or computing a hexadecimal checksum based on analysis of the contents of an electronic document, is essentially like recording the DNA of an electronic file. If the file is altered, its hash value would be different. So, by computing the hash value at the source, in transit, and at the destination, I can ensure that the electronic file is in exactly the same state as it was at the source (or, that the collected document is the same as the document originally created). Now, add the ability to report on that information and those container files and full forensic disk images really do become extreme overkill.
The important distinction here is that the term “forensic” does not refer to a type of technology or the products of a specific vendor – despite claims and propaganda to the contrary. Forensic refers to the methodology used by the person collecting the evidence – whether it is finger prints from a weapon or electronic files from an employee’s laptop. Forensic imaging, however, refers to the process by which an entire hard disk is copied bit by bit to create an exact duplicate of that hard drive in a forensic manner. It is entirely possible for a collection of ESI to be “forensically sound” by simply employing the technique described above of taking hash values at each stage of the process to be able to prove that the files were not altered during collection. As long as chain of custody is also maintained (much easier to do now that we are not using multiple tools, vendors, locations, and people to do the job), then the process should meet the threshold admissibility requirements of the Federal Rules of Evidence.
Opponents will still bring up claims that the evidence must have been altered, or the expert familiar only with forensic imaging technologies will try to use the argument that only vendor X’s technology is “court vetted,” so any other method is not acceptable. But, to these opponents, I would argue two points:
- No technology is “court vetted”. The operator’s use of the technology in the specific case (in a specific jurisdiction) was acceptable to the court to meet the threshold showings required by FRE 901, 1001, and 1002 – as well as any rules of procedure governing the production of discovery in either a civil or criminal matter. Wow – that would be a very long footnote on a marketing slide…probably why it is not usually mentioned.
- The process is forensically sound, and you can prove that the documents were not altered from collection through production by referencing the hash value and maintaining copies of the original native files analyzed on a secured preservation store. This would exceed the requirements of FRE 901, 1001, and 1002 – but would provide protection against claims going to the “weight” of the evidence by opponents who would cry foul.
What Now?
So, where does all of this leave us? First, in the vast majority of civil litigation matters where electronic discovery is being performed, forensic bit by bit imaging of computer hard drives is simply not required. Vendors have promoted this practice over the years, but all this has done is over-complicate the eDiscovery process for many unsuspecting litigants and dramatically increase costs because the model simply does not scale. Moreover, the effort and cost required to deal with these full drive images downstream in the process is often overlooked by these vendors and overzealous consultants. Next, we now know there is a better way – targeted, forensically-sound collection of ESI using streamlined and automated solutions that maintain custodian relationship – even for shared data sources – throughout the eDiscovery lifecycle, preventing form of production disputes and other calamities that have plagued this industry for the last decade. There is a better way to collect ESI that will provide exponential cost savings all the way to production.