Posts Tagged ‘sampling’

The Electronic Discovery Sheriff Is Back In Town

Thursday, January 29th, 2009

As Tiger Woods is to golf, the honorable Shira A. Scheindlin is to electronic discovery.  She has unquestionably been the most dominant/visible/outspoken jurist in the electronic discovery realm over the past decade, penning amongst others, the Zubulake opinion, which is commonly referred to as the gold standard in electronic discovery.

But, like Woods, who recently took a sabbatical to mend his surgically repaired knee, Judge Scheindlin has recently been eclipsed by several other notable electronic discovery jurists, namely Judge Grimm (of Victor Stanley and Mancia fame) and Judge Facciola (aka “the Italian Stallion“) both of whom made numerous “best of the year” electronic discovery case law lists.

With Securities and Exchange Commission v. Collins & Aikman Corp., 2009 WL 94311 (S.D.N.Y., Jan. 13, 2009) Judge Scheindlin serves notice that the sheriff is back in town.  She not only tackles a number of thorny electronic discovery topics, but ambitiously takes on the US government in the process.  It’s fairly lengthy opinion, well worth the read, so I’ll just excerpt out a few of the notable takeaways.

As a bit of background…  the Collins case centered around a securities fraud complaint brought by the SEC against the Collins & Aikman Corp. and its former CEO David A. Stockman.  The crux of the dispute surrounded questions concerning the government’s discovery obligations in civil discovery (versus in a purely SEC investigation per se).

There were four distinct but interrelated disputes, namely:

“(1) Whether identifying responsive documents that have been organized by the producing party invades the protection accorded to attorney work-product and how a government agency-acting in its investigative capacity-must respond to a request for the production of documents. (2) Whether a government agency may unilaterally restrict the scope of its search based on an assertion of an “undue burden” on limited public resources. (3) How much information the Government must disclose in order to allow an adversary-and the court-to assess an objection based on the deliberative process privilege. (4) Whether a government agency may unilaterally exclude its own e-mail from document production on the ground that most-but not all-will be privileged.”

Addressing the work product claims, the court found against the government, again reinforcing several recent opinions about electronic discovery search:

“The SEC contends that Stockman can search through the ten million pages and find substantially the same documents identified by the SEC without impinging on the thought processes of the SEC attorneys. Indeed-at significant expense and delay-Stockman could search the document databases using appropriate search terms, but the inaccuracy of such searches is by now relatively well known.  A page-by-page manual review of ten million pages of records is strikingly expensive in both monetary and human terms and constitutes “undue hardship” by any definition.” [Citing, George L. Paul and Jason R. Baron's article: Information Inflation: Can the Legal System Adapt?

After losing the first battle, the SEC argued that even if the compilations were not protected as work product, it could produce the "complete, unfiltered, and unorganized investigatory file" since this was how the documents were "maintained in the usual course of its business."  This second attempt was similarly unpersuasive as Judge Scheindlin held that the "usual course of business" exemption did not apply:

"[C]onducting an investigation-which is by its very nature not routine or repetitive-cannot fall within the scope of the “usual course of business.” While the SEC routinely collects and maintains regulatory submissions such 10-K reports, in its investigative capacity the agency conducts tailored probes of a company or an industry, requiring the gathering of records from diverse sources. Many if not most of the 1.7 million documents in the SEC production here were likely collected in the agency’s investigatory role. Thus it is no surprise that the complete collection is maintained as it was collected-in large disorderly databases. The documents can only be provided in a useful manner if the agency organizes or labels them to correspond to each demand.”

Next, Judge Scheindlin addressed the SEC’s decision to “unilaterally” limit its search to “centralized compilations” which ultimately “turned up nothing.”  She found that the SEC’s “blanket refusal to negotiate a workable search protocol” was “patently unreasonable” citing both Mancia and the Sedona Conference’s Cooperation Proclamation:

“Rule 26(f) requires the parties to hold a conference and prepare a discovery plan. … Had this been accomplished, the Court might not now be required to intervene in this particular dispute. I also draw the parties’ attention to the recently issued Sedona Conference Cooperation Proclamation, which urges parties to work in a cooperative rather than an adversarial manner to resolve discovery issues in order to stem the ‘rising monetary costs’ of discovery disputes.”

As the coup de gras, Judge Scheindlin addressed and rejected out of hand the SEC’s most untenable claim that it would not produce e-mail “generated or received by the Commission itself” because “nearly all responsive e-mails will be privileged, protected, or non-substantive.”

“Because e-mails are inherently searchable, the SEC’s blanket refusal to produce any in-coming or outgoing e-mails is unacceptable. Without even an attempt to negotiate search terms that would weed out privileged, protected, or irrelevant e-mails, the SEC cannot reasonably assert that a routine aspect of modern discovery-search and review of a party’s e-mail-is beyond its capability. Essentially, the SEC’s position is that the cost of such a search is simply too high, but it has made no effort to document the cost or the likelihood that it would produce relevant, nonprivileged material. The concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.”

At the end of the day, the Collins opinion seems to make statement the Judge Scheindlin is back with a vengeance and she’s serving notice that the government isn’t above the law:

“Like any ordinary litigant, the Government must abide by the Federal Rules of Civil Procedure.”

Besides knocking the government down a peg, Judge Scheindlin throws her judicial weight behind a number of important but nascent trends, including the Sedona Cooperation Proclamation, the related need to meet & confer, the use of sampling and the challenges of electronic discovery search. While none of these notions are groundbreaking, her substantial backing means increasing clarity for lawyers and litigation support practitioners everywhere.  And, that’s certainly welcome.

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance

Judge Grimm, Victor Stanley, And The Problem Of “Black-Box” E-Discovery Search

Friday, August 22nd, 2008

Judge Paul Grimm’s recent opinion in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008) provides valuable guidance on one of the most important issues in e-discovery: how to conduct keyword searches in a defensible manner given that keyword searches are prone to produce over- and under-inclusive results.  The ruling suggests one of two approaches: either producing parties should adopt a “collaborative” approach to conducting keyword searches, whereby each party agrees on a search methodology; or, they should use a “best practices” approach, such as the one suggested by Sedona, where the producing party tests, samples, and iteratively refines searches so that they can demonstrate they have taken reasonable measures to reduce over- and under-inclusive results.

While the guidance is clear, following the guidance in practice is very difficult.  The primary reason for this is that the search technology being used in e-discovery today is not up to the task.  Specifically, today’s search technology suffers from three problems:

  1. The over- and under-inclusive tradeoff. Many technologies have been developed to address the tendency of keyword searches to miss relevant documents and produce under-inclusive results.  Wildcard and stemming technology has been developed in order to address the issue of finding common word variations in specified keywords.  Concept search has been designed to find documents containing words with similar meanings to the keywords in a search.  And fuzzy search technologies have been put in place to find misspellings of words. However, all of these suffer from the same problem: they produce too many non-relevant or “false positive” documents thus driving up the cost of review. For example, if someone runs the wildcard search “divers*”, then he or she not only gets the desired documents containing “diverse” and “diversity”, but also gets a large number of false positive documents containing “diversion”, “diversification”, and so on.  In the case of concept and fuzzy search, the problem is so great that these technologies to date have rarely been used in e-discovery.
  2. Too expensive to test, sample and refine searches. Today’s search technologies are largely designed to run one search at a time, not the dozens of searches that are typical in e-discovery. As a result, anyone trying to follow the best practices of testing, sampling, and refining each search will find themselves missing deadlines and running over budget because it takes so long. This also makes collaboration with the opposing party close to impossible, since there’s little time to iterate on – and agree upon - a set of keyword searches.
  3. Manual documentation. It’s not enough for producing parties to use best practices, they have to document them so that they can “show their work” to the court. Currently, documenting the search refinement process is mostly manual, with the result that it is either done inadequately or not at all.

The reason why the search technology used for e-discovery has these problems is surprisingly simple: it’s because the technology was not designed for e-discovery in the first place. Rather, it was built for enterprise search, and was only later repurposed towards e-discovery.

The “Black Box” Of Enterprise Search

The core issue is that enterprise search technology has been designed to be a “black box”. Users enter a single search query into one end, and get results at the other, with no visibility into what happens in between. Going back to our previous example, when a user searches for “divers*” intending to find documents related to “diversity” or “diverse”, enterprise search engines give the user no visibility into the crucial step of query expansion and how it expands the search query into relevant and non-relevant terms like “diversion” and “diversification”. As a result, the user has no ability to minimize the false positives.

In the same vein, when a user enters multiple queries into a “black box” enterprise search engine, all of the queries run as a single search, and the user has no visibility into which results are associated with which query. For example, a user that searches for “hiring OR interview” will get the results for the combination of the queries “hiring” and “interview”. He or she won’t know that only 5 of documents contained “hiring” while 100 documents contained “interview.”  This limitation makes analyzing, sampling and refining searches costly and time consuming.

That’s not say that enterprise search products like Autonomy or Endeca are flawed. Far from it.  Their “black box” design works exceedingly well for the simple and quick queries that people want to run across the enterprise for general business purposes. If a sales manager is looking for a single proposal for her meeting the following day, then she doesn’t care how the search was performed or if it’s over-inclusive.  She’s only interested in the first page of relevant results, and for that use case enterprise search engines do a great job.

But e-discovery is a whole different world.  In e-discovery, users typically must review every single document in the search results, not just the most relevant ones.  As a result, over-inclusive searches can dramatically increase the costs of downstream production and review.  And under-inclusive searches raise the issue of defensibility.  Finally, e-discovery users have to run a lot of search queries and understand which documents are associated with each of those queries.

So, going back to the original problem, if current search technologies cannot help lawyers and litigation support professionals follow Judge Grimm’s guidance and address the “well-known limitations” of keyword search, what can? That will be the subject of my next post.

Five E-Discovery Questions with Craig Ball

Tuesday, August 12th, 2008

cball1.gifIn the spirit of the popular New York Times magazine feature, with this post we inaugurate what we hope to be a long-running series of interviews with e-discovery luminaries to get their take on emerging ideas and trends (and hopefully have some fun as well).

Today’s questionee is e-discovery and forensics expert (and popular Law Technology News columnist) Craig Ball.  Craig’s combination of wit and insight speaks for itself, so let’s just get right to the questions.

1) The cases that are on everyone’s mind are O’Keefe/Lundin and Victor Stanley. What’s the practical impact of these rulings to the e-discovery practitioner?

Certainly these decisions have captured my enthusiastic attention.  Lawyers now have to devote greater care and thought to electronic search, and wake to the empirical evidence establishing the shocking shortfalls of keyword search in unstructured ESI collections.  The days of “let’s try these search terms and see what happens” are numbered.  Queries that will be run across mushrooming collections must pass muster in terms of noisiness, ambiguity, potential for misspelling, affinity to stemming, synonyms, slang, acronyms, IM-speak and other criteria unfamiliar to a profession that prides itself on precise expression.  Lawyers need to embrace concepts of “precision,” “recall” and “sampling” with the same fervor we once brought to the Statute of Frauds and the Rule Against Perpetuities.

Currently, lawyers on both the north and south sides of the docket are the unjust beneficiaries of slipshod search.  Requesting parties benefit from the economic leverage attendant to costly-yet-unavailing fishing expeditions while counsel for producing parties mint obscene pyramidal profits reviewing mountains of electrochaff.  Despite all the vitriol, rarely does either side’s counsel set out to exploit flawed searches.  It’s mostly blissful ignorance at work, coupled with little incentive to fix what’s broken.  Accordingly, Judges like Facciola and Grimm are picking up the baton and running with it.  It’ll be a long, tough race—and not every jurist will head for the tape—but I applaud those who’ve left the blocks!

Search demands nuance, discipline and scientific method.  Prepare to routinely test queries against sample collections, as soon that practice will be as commonplace as DNA testing in paternity cases.

2) What can e-discovery technology providers do to help?

At the risk of appearing ungracious, I can’t help but note that vendors eat at the same gluttonous table as lawyers, and vendor marketing is often so much snake oil.  Until the EDD vendor community takes a longer view of the market, stops building businesses for acquisition and starts building them to last, I don’t think they can be of much help.  The industry should stop pretending their processes and software are “proprietary” and touting their secret sauces.  Instead, how about delivering consistent, predictable service and pricing delivered by experienced, reliable and unflinchingly honest, genuinely knowledgeable personnel who welcome the chance to help lawyers understand this stuff.  If employees stayed around more than six months, that would be nice, too.

3) You recently participated in a new track at LegalTech West called FutureTech.  For those who missed it or the follow-up podcasts, what’s an emerging e-discovery trend that you think might take people by surprise?

Several come to mind.  Mediated meet-and-confer, for example.  The cost of a failed EDD effort can dwarf the amount in controversy, so it makes sense to turn to neutral, technically adept intermediaries to help resolve nettlesome questions, of scope, search, forms of production and cost sharing.  Folks just behave better when company comes.  I also foresee divergence between discovery and the other traditional phases of litigation.  We may see entirely different teams handle discovery in a zealous but non-confrontational manner, leaving the scorched earth stuff to others.

Another development that will sneak up on most lawyers is the growing marginalization of text.  As natural interfaces emerge—where you will talk or gesture to your computers—and as communication gets more real time and visual, words will manifest conduct less frequently.  Take YouTube.  I don’t get it—to me, it’s silly and boring—but it’s rich and exciting to my kids…and text is tertiary.

Something else that will change is where we look for evidence.  If you were pursuing discovery against a teenager, where would you go to locate their most revealing ESI?   Social networking (virtualized storage)?   Cell phones and laptops (portable devices)?   Gaming devices (alternate platforms)?  In ten years, don’t imagine they won’t favor and extend the tools they grew up with.

Data is the ultimate portable commodity, so it’s odd we don’t take our computing environments with us. We will. If desktop machines survive, they will be little more than screens with network connectivity temporarily hosting the virtual identities we carry in our pockets or store online. Local hard drives will be an increasingly irrelevant place to search for files as EDD turns to personal storage devices and online storage.

Other trends lawyers may not foresee: People will retain much more data as there will be little incentive and less time to make it go away. “Cheaper to keep her” will be how most of us deal with data.  Location data will be routinely tracked by many devices with GPS functionality on and about our person, so this will become a new and useful evidence stream.  Virtual machines will be used as forms of production.  Local storage will give way to cloud storage.  Hey, I could do this one all day!

4) You have an extensive background in both e-discovery and computer forensics. Do you see a convergence, or will they remain largely separate worlds from a process and technology perspective?

I see convergence already.  “Forensically sound” practices are creeping into EDD harvest and traditionally rigid approaches to disk forensics are being challenged by the practical realities of immense volume and mission-critical operations.   We see the growth of “live” forensics, hash values displacing Bates numbers and operating systems allowing more and more deleted information to be easily resurrected.

The tools and techniques of each discipline are also converging.  But there will remain a distinction between the two flowing from the unique ability of a skilled forensics examiner to distill the bits and bytes into a compelling tale of human strength or frailty.  It’s painfully easy to misread the significance of digital footprints.  There’s a component of science and art to computer forensics that will insure its distinction and growth.

We face convergent challenges, too.  In both forensics and EDD, the lure of lucre pulls in people who really ought to be doing something less harmful.  Lives, liberty, fortunes, and careers hinge on some computer forensic examinations; yet, some schools and tool sellers promote the notion that you can learn what you need to know over a long weekend.  Just as many copy shops decided they were e-discovery experts one dark night, a lot of poorly trained, incurious and careless forensic examiners are popping up all over.  I’m frankly appalled by some of what I see out there.   Where I hope we ultimately converge is a high standard of professionalism and proven expertise.

5) Finally, the question on the mind of every loyal “Ball in Your Court” reader: Which court is it — basketball, tennis, or volleyball?

I’ve never been much for team sports, but if I have to choose, I opt for the one played on the beach by fit, bikini-clad women.  I may be a hopeless nerd, but I’m not stupid.