Archive for February, 2011

Judge Scheindlin Decides that the Metadata is “Integral” in FOIA Case: Fmr. Judge Ron Hedges Weighs In

Monday, February 28th, 2011

Just as when Judge Scheindlin penned Pension Committee, her latest opinion is already garnering a ton of buzz.  In Nat. Day Laborer Org. Network v. United States Immigration and Customs Enforcement Agency (“NDLON”), 2011 WL 381625 (S.D.N.Y. Feb. 7, 2011) Judge Scheindlin boldly takes on four governmental agencies (ICE, the Department of Homeland Security, the Federal Bureau of Investigation, and the Office of Legal Counsel) over metadata production in response to FOIA demands.

In NDLON Plaintiffs submitted identical twenty-one page FOIA requests to each of the four defendant agencies.  And, after some initial missed deadlines and judicial intervention, Plaintiffs sent the defendants a proposed protocol that requested a specific format for the production of electronic records.  Significantly, the proposed protocol was based on the “format demands routinely made by two government entities-the Securities and Exchange Commission and the Department of Justice Criminal Division” (invoking the old “good for the goose” argument).

Before ruling on the protocol, Judge Scheindlin examined the parties’ efforts to cooperate and she was uniformly underwhelmed:

“As far as I can tell from the record submitted by the parties, the equivalent of a Rule 26(f) conference, at which the parties are required to discuss form of production, was not held and no agreement regarding form of production was ever reached. Nor was a dispute regarding form of production brought to the Court for resolution.”

In evaluating controlling law, the fact that “[n]o federal court has yet recognized that metadata is part of a public record as defined in FOIA” didn’t stop Judge Scheindlin from looking to both state law and the FRCP for guidance.  Next, she relied on Aguilar, which noted that the Sedona Conference abandoned an earlier presumption against the production of metadata in recognition of “‘the need to produce reasonably accessible metadata that will enable the receiving party to have the same ability to access, search, and display the information as the producing party ….’”  She then foreshadowed her subsequent ruling by concluding: “[b]y now, it is well accepted, if not indisputable, that metadata is generally considered to be an integral part of an electronic record.”

The Government, not surprisingly didn’t go down without a fight, arguing that “metadata is substantive information that must be explicitly requested and then reviewed by an agency for possible exemptions.”  In concert they also claimed that “if the requirements of FOIA and the requirements of the Rules conflict, FOIA must trump the Rules.”  Judge Scheindlin wasn’t persuaded, holding that:

“[T]here is no need to decide this question because FOIA does not conflict with the Rules. FOIA is silent with respect to form of production, requiring only that the record be provided in ‘any form or format requested by the person if the record is readily reproducible by the agency in that form or format.’… Defendants’ productions to date have failed to comply with Rule 34or with FOIA.”

In terms of the remedy for the government’s failure, she did cut them some slack:  “Because no metadata was specifically requested in Plaintiffs’ July 23 e-mail, and because this is an issue of first impression, I will not require Defendants to re-produce all of the records with metadata.”  But for future productions she held that the bulk of the ESI be produced in “TIFF image format but with corresponding load files, Bates stamping, and the preservation of “parent-child” relationships (i.e. the association between an attachment and its parent record)” citing the metadata list below for non-email files.

  1. Identifier
  2. File Name
  3. Custodian
  4. Source Device
  5. Source Path
  6. Production Path
  7. Modified Date
  8. Modified Time
  9. Time Offset Value

So, here’s the rub.  The legal populous, not surprisingly, likes bright line rules.  So, when Judge Scheindlin writes (in Footnote 41):  “[w]hile not necessary to the holding in this case, I believe that these are the minimum fields of metadata that should accompany any production of a significant collection of ESI” it’s easy to see how the above nine fields may become a blunt instrument wielded haphazardly by requesting parties.   Not surprisingly, Judge Scheindlin is aware of her mantle and further tries to caveat her holding (in footnote 44):

“To be clear, my Order requiring the use of this Proposed Protocol for future productions-as amended by the specific metadata fields I have required and by the options I have offered the parties regarding the form of production for spreadsheets-is limited to this case. I am certainly not suggesting that the Proposed Protocol should be used as a standard production protocol in all cases. The production of individual static images on a small scale, where no automated review platform is likely to be used, may be perfectly reasonable depending on the scope and nature of the litigation.

The impact of footnote 44 was top of mind when I recently spoke to Fmr. Judge Ron Hedges who chimed in:

“Attorneys must confer with regard to production requirements, as they should before bringing any dispute before a federal court. Moreover, attorneys should recognize that, as Judge Scheindlin said in footnote 44, that the selection of metadata fields to request are case-dependent.  Any attempt to arrive at a ‘universal’ or ‘bright line’ standard for production of metadata ignores the text of Rule 34(b) and the bargaining that occurs in meets-and-confers, and the unique aspects of individual civil actions.”

Despite agreeing with Judge Hedges’ sentiment, the main question in my mind will be whether footnote 44 is given its due weight going forward.  My concern is that, as is oft discussed with her Pension Committee decision, parties may hone in on the bright line test and miss the nuances.  While it’s easy to argue against the folly of this thinking, it may not stop it from happening in the near term.

Finally, in another shout out to the Cooperation Proclamation, Judge Scheindlin takes a swipe at counsel, who forced her to rule on an “e-discovery issue that could have been avoided had the parties had the good sense to ‘meet and confer,’ ‘cooperate’ and generally make every effort to ‘communicate’ as to the form in which ESI would be produced.”

“The quoted words are found in opinion after opinion and yet lawyers fail to take the necessary steps to fulfill their obligations to each other and to the court. While certainly not rising to the level of a breach of an ethical obligation, such conduct certainly shows that all lawyers-even highly respected private lawyers, Government lawyers, and professors of law-need to make greater efforts to comply with the expectations that courts now demand of counsel with respect to expensive and time-consuming document production. Lawyers are all too ready to point the finger at the courts and the Rules for increasing the expense of litigation, but that expense could be greatly diminished if lawyers met their own obligations to ensure that document production is handled as expeditiously and inexpensively as possible. This can only be achieved through cooperation and communication.”

In the end, NDLON will continue to generate a ton of discussion (as did Zubulake and Pension Committee).  While this decision won’t single-handedly end the metadata discussion it will hopefully serve as a launching point for more clarity down the road.  For this, practitioners on both sides of the debate should be thankful.

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

The Perils of Data Collection in High Stakes Litigation: Which Approach Is Right For Your Organization?

Monday, February 7th, 2011

Many organizations involved in litigation, investigations, or audits struggle to meet deadlines for collecting and producing electronically stored information (ESI) from employees without breaking the budget.  The biggest challenges are typically faced by large organizations with multiple offices and large numbers of employees.  However, even smaller organizations with few offices face challenges if they have remote employees or employees who travel frequently, aka road warriors.  In this first of a two-part series, I’ll discuss when and why organizations should choose a manual collection process.  Part two will discuss the advantages and disadvantages of two automated data collection approaches.

In each situation, the organization is faced with a request for ESI and some portion of the potentially relevant ESI is located in remote offices or on laptops used by road warriors.  Preserving and collecting ESI across multiple systems such as email and file servers, archival systems, Microsoft SharePoint, and personal computers can be challenging whether these systems are located centrally or in the cloud.  Common challenges include:

  • Pressing deadlines
  • Risk of data loss or deletion
  • Failure to produce responsive data without legal justification
  • Lack of information technology (IT) department resources
  • Miscommunication between the IT and legal departments

These challenges are compounded for organizations with remote offices or road warriors because more coordination and effort is inevitably required, thereby increasing expenses and the risk of failure.   The key to success is determining which data collection approach is best for your organization.  First, let’s discuss the traditional manual approach.

The Traditional Manual Approach

There are two different manual data collection approaches that organizations utilize with varying degrees of success.  Employee self-collection and IT assisted collection.

Employee Self-Collection

The various data collection approaches often begin as part of an investigation, litigation, or audit that requires the identification of employees likely to have data relevant to a particular matter.   Those employees, or data custodians as they’re called, are asked to forward or copy any relevant ESI they possess to a centralized location or storage device where the data is stored for later analysis and review by the legal team.  One problem with this approach is that copying files could result in metadata information such as document dates being altered.  Another problem with this approach is that custodian’s memories fade over time and they may forget to produce relevant ESI.  Even worse, a custodian with a personal stake in the investigation may intentionally delete the very files being requested in an effort to thwart the investigation.  These scenarios could result in the organization facing sanctions or penalties, making employee self-collection a potentially risky and costly approach in almost any situation involving multiple custodians, offices, or large amounts of data.

IT Assisted Collections

The IT assisted collection approach is another manual approach that eliminates some of the risks associated with the employee self-collection method, but this approach often presents different challenges and often leads to “over collection” of ESI.  Typically one or more employees in the IT or IT Security Department are instructed to collect data from employees believed to have information relevant to a particular case.  To avoid overlooking or losing data, the IT resources collect data from numerous locations using computers loaded with specialized collection software.   Data to be collected from each relevant employee often resides on numerous devices including laptops, desktops, file servers, email servers, and other sources.   Once all the data for each custodian is collected from each data source, the data is copied and consolidated to a removable hard drive or drives where it awaits future processing, analysis, and review by the legal department.   Unfortunately for the IT department, this entire process is repeated for every new case and often results in a significant loss of productivity.

IT assisted collections were once the norm because this process was thought to represent the most efficient and effective way to avoid the risk of sanctions posed by the employee self-collection approach.   However, this approach is quickly falling out of vogue for two reasons:

First, IT assisted collections can increase the time, cost, and risk associated with data collection because the use of different technology tools can be challenging.   Organizations applying the IT assisted collection approach typically rely on off-the-shelf software such as Guidance Encase, Robocopy, ExMerge, Access Data’s Forensic Toolkit (FTK) or other tools to collect data from each relevant custodian. Frequently, different tools are utilized to collect data from different data sources.  For example, it is not uncommon for the IT department to use ExMerge to collect from Microsoft Exchange, Robocopy to collect from file servers, Encase to collect from laptops and desktops, and even other proprietary tools to collect data found in commonly used archives.  In addition to being time consuming, utilizing multiple tools to collect and consolidate data results in licensing, training, and maintenance costs for each product and the risk of data loss or alteration is heightened since data collected from multiple tools must eventually be exported and consolidated for further processing, analysis, and review.  Lastly, using multiple IT staff with varying levels of expertise to collect data arguably increases the risk of metadata being altered and complicates the ability to maintain accurate chain of custody logs.  In practice, many organizations using multiple collection tools spend countless hours trying to manually maintain chain of custody reports using Excel spreadsheets while other organizations simply neglect or ignore chain of custody requirements.  Each of these situations virtually invites evidentiary attacks by savvy opponents.

The second reason IT assisted collections are falling into disfavor is because the approach often results in the over collection of data.  To avoid the risk of sanctions or penalties resulting from data loss or deletion, sometimes entire laptop and desktop hard drives are copied or “imaged” (frequently called a “forensic image”).  Similarly, IT resources are often incentivized to “copy everything” simply to avoid being forced to revisit data sources from which data has already been partially collected in response to a new request for information.

The IT assisted approach of forensically imaging drives can be effective in limited situations including criminal investigations and intellectual property theft cases since these matters sometimes require the recovery and analysis of deleted files, internet browsing history, and other non-user generated files for a discreet number of custodians.  However, since most large matters do not require this degree of data recovery for most data sources, unnecessarily collecting data by making forensic images often results in a significant waste of time and money.

Which Approach is Right for Your Organization?

The risks and expenses associated with both manual approaches described above are often so high that organizations sometimes decide it is economically more efficient to settle lawsuits even when the lawsuit lacks merit.  This untenable position has led many organizations to seek more efficient and repeatable methods to manage data collection that are automated.  These automated approaches will be explored in my next post.