Posts Tagged ‘recall’

As the Electronic Discovery World Zurns

Wednesday, July 29th, 2009

Judge Grimm’s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.

“[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

Despite, legions of articles and blogs on the topic, at least certain portions of the bench haven’t taken heed.  In the case In re: Zurn Pex Plumbing Products Liability Litigation, 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter “Zurn“), U.S. District Judge Ann Montgomery receives points for understanding some basic e-discovery tenants around recall and precision, but then mysteriously goes where “angels fear to tread” by suggesting her own search terms.

Examining the case facts in more detail,…  Zurn is a class action products liability case where discovery was bifurcated (as is often the case – see Spieker v. Quest Cherokee) to first cover the class “certification” component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that “while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.”  However, the Magistrate didn’t shut the door entirely, suggesting that “should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.”

Despite complying with Sedona’s Cooperation Proclamation (”The parties have worked amicably throughout the discovery process”) opposing counsel still got to loggerheads when plaintiff found “voids” in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.

Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff’s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared “J” and “K” drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required “approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.”  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could’ve added another $400,000 for the project.

Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney’s representations about the resulting burden:

“It is unclear whether Zurn’s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians’ emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.”

The 361 gigabytes apparently resulted from “hits” corresponding to plaintiff’s 26 search terms.  The court correctly identified that those terms had precision issues (”many of Plaintiffs’ proposed search terms will likely produce a large number of ‘hits’ that have limited relevance in the case.”)

Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm’s warning and surprisingly took matters into her own hands: “the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.”  Here is the Judge’s list of keywords:

(1) AADFW,
(2) Corrosion,
(3) Corrosive,
(4) Corrosive Water,
(5) Crack,
(6) De-zinc,
(7) Dezincification,
(8) DZR,
(9) Fail,
(10) IMR,
(11) Leak,
(12) MES,
(13) SCC,
(14) Stress corrosion cracking

Without looking at the underlying data, it’s clear from the outset that Judge Montgomery didn’t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could’ve been captured by a single stemmed search using the term “corros*.” Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would’ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.

Judge Montgomery did recognize the potential error of her ways and gave the parties an out:

“The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.”

This “specific evidence” language seems to track notions from Sedona’s search best practices protocol, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless blindly proffer the 14 term search strategy.  Instead, she should’ve quoted Victor Stanley and required the parties to come up with a data driven approach that met requisite precision and recall metrics.

Five E-Discovery Questions with Craig Ball

Tuesday, August 12th, 2008

cball1.gifIn the spirit of the popular New York Times magazine feature, with this post we inaugurate what we hope to be a long-running series of interviews with e-discovery luminaries to get their take on emerging ideas and trends (and hopefully have some fun as well).

Today’s questionee is e-discovery and forensics expert (and popular Law Technology News columnist) Craig Ball.  Craig’s combination of wit and insight speaks for itself, so let’s just get right to the questions.

1) The cases that are on everyone’s mind are O’Keefe/Lundin and Victor Stanley. What’s the practical impact of these rulings to the e-discovery practitioner?

Certainly these decisions have captured my enthusiastic attention.  Lawyers now have to devote greater care and thought to electronic search, and wake to the empirical evidence establishing the shocking shortfalls of keyword search in unstructured ESI collections.  The days of “let’s try these search terms and see what happens” are numbered.  Queries that will be run across mushrooming collections must pass muster in terms of noisiness, ambiguity, potential for misspelling, affinity to stemming, synonyms, slang, acronyms, IM-speak and other criteria unfamiliar to a profession that prides itself on precise expression.  Lawyers need to embrace concepts of “precision,” “recall” and “sampling” with the same fervor we once brought to the Statute of Frauds and the Rule Against Perpetuities.

Currently, lawyers on both the north and south sides of the docket are the unjust beneficiaries of slipshod search.  Requesting parties benefit from the economic leverage attendant to costly-yet-unavailing fishing expeditions while counsel for producing parties mint obscene pyramidal profits reviewing mountains of electrochaff.  Despite all the vitriol, rarely does either side’s counsel set out to exploit flawed searches.  It’s mostly blissful ignorance at work, coupled with little incentive to fix what’s broken.  Accordingly, Judges like Facciola and Grimm are picking up the baton and running with it.  It’ll be a long, tough race—and not every jurist will head for the tape—but I applaud those who’ve left the blocks!

Search demands nuance, discipline and scientific method.  Prepare to routinely test queries against sample collections, as soon that practice will be as commonplace as DNA testing in paternity cases.

2) What can e-discovery technology providers do to help?

At the risk of appearing ungracious, I can’t help but note that vendors eat at the same gluttonous table as lawyers, and vendor marketing is often so much snake oil.  Until the EDD vendor community takes a longer view of the market, stops building businesses for acquisition and starts building them to last, I don’t think they can be of much help.  The industry should stop pretending their processes and software are “proprietary” and touting their secret sauces.  Instead, how about delivering consistent, predictable service and pricing delivered by experienced, reliable and unflinchingly honest, genuinely knowledgeable personnel who welcome the chance to help lawyers understand this stuff.  If employees stayed around more than six months, that would be nice, too.

3) You recently participated in a new track at LegalTech West called FutureTech.  For those who missed it or the follow-up podcasts, what’s an emerging e-discovery trend that you think might take people by surprise?

Several come to mind.  Mediated meet-and-confer, for example.  The cost of a failed EDD effort can dwarf the amount in controversy, so it makes sense to turn to neutral, technically adept intermediaries to help resolve nettlesome questions, of scope, search, forms of production and cost sharing.  Folks just behave better when company comes.  I also foresee divergence between discovery and the other traditional phases of litigation.  We may see entirely different teams handle discovery in a zealous but non-confrontational manner, leaving the scorched earth stuff to others.

Another development that will sneak up on most lawyers is the growing marginalization of text.  As natural interfaces emerge—where you will talk or gesture to your computers—and as communication gets more real time and visual, words will manifest conduct less frequently.  Take YouTube.  I don’t get it—to me, it’s silly and boring—but it’s rich and exciting to my kids…and text is tertiary.

Something else that will change is where we look for evidence.  If you were pursuing discovery against a teenager, where would you go to locate their most revealing ESI?   Social networking (virtualized storage)?   Cell phones and laptops (portable devices)?   Gaming devices (alternate platforms)?  In ten years, don’t imagine they won’t favor and extend the tools they grew up with.

Data is the ultimate portable commodity, so it’s odd we don’t take our computing environments with us. We will. If desktop machines survive, they will be little more than screens with network connectivity temporarily hosting the virtual identities we carry in our pockets or store online. Local hard drives will be an increasingly irrelevant place to search for files as EDD turns to personal storage devices and online storage.

Other trends lawyers may not foresee: People will retain much more data as there will be little incentive and less time to make it go away. “Cheaper to keep her” will be how most of us deal with data.  Location data will be routinely tracked by many devices with GPS functionality on and about our person, so this will become a new and useful evidence stream.  Virtual machines will be used as forms of production.  Local storage will give way to cloud storage.  Hey, I could do this one all day!

4) You have an extensive background in both e-discovery and computer forensics. Do you see a convergence, or will they remain largely separate worlds from a process and technology perspective?

I see convergence already.  “Forensically sound” practices are creeping into EDD harvest and traditionally rigid approaches to disk forensics are being challenged by the practical realities of immense volume and mission-critical operations.   We see the growth of “live” forensics, hash values displacing Bates numbers and operating systems allowing more and more deleted information to be easily resurrected.

The tools and techniques of each discipline are also converging.  But there will remain a distinction between the two flowing from the unique ability of a skilled forensics examiner to distill the bits and bytes into a compelling tale of human strength or frailty.  It’s painfully easy to misread the significance of digital footprints.  There’s a component of science and art to computer forensics that will insure its distinction and growth.

We face convergent challenges, too.  In both forensics and EDD, the lure of lucre pulls in people who really ought to be doing something less harmful.  Lives, liberty, fortunes, and careers hinge on some computer forensic examinations; yet, some schools and tool sellers promote the notion that you can learn what you need to know over a long weekend.  Just as many copy shops decided they were e-discovery experts one dark night, a lot of poorly trained, incurious and careless forensic examiners are popping up all over.  I’m frankly appalled by some of what I see out there.   Where I hope we ultimately converge is a high standard of professionalism and proven expertise.

5) Finally, the question on the mind of every loyal “Ball in Your Court” reader: Which court is it — basketball, tennis, or volleyball?

I’ve never been much for team sports, but if I have to choose, I opt for the one played on the beach by fit, bikini-clad women.  I may be a hopeless nerd, but I’m not stupid.

What’s Different About E-Discovery Search?

Monday, May 5th, 2008

raiders-warehouse.jpgIn his latest article, Craig Ball argues that lawyers “need to learn more about the science of search.” Craig says that at least part of the reason for this is that searching in e-discovery is challenging and different from the searching to which lawyers are accustomed.

“Lawyers believe themselves adept at keyword search in e-discovery because they’ve mastered keyword search in online legal research. The correlation is superficial at best. Unlike the crazy quilt of ESI, the language of reported cases is precise, consistent and structured. Misspellings are rare. Legal research is Disneyland. E-discovery is Baghdad.”

I had a conversation on a similar litigation discovery topic with Ron Friedman last month after my last post where he made a similar argument about lawyers needing to learn e-discovery search tools.1

I think Craig and Ron make excellent points. E-Discovery using litigation support software search is different and it’s important for lawyers, investigators, litigation support professionals and other practitioners to understand how. The natural questions that arise from their arguments are: what is different about e-discovery search? How is it different from other familiar searches, such web search and legal research search? The answers are important because it can help guide e-discovery experts on how to train lawyers and even guide attorneys during litigation discovery review. It is also important for developing e-discovery best practices and e-discovery search software.

I think the first step in answering these questions is to agree on the definition of e-discovery search, or better said the types of e-discovery search since there are several. To address this appropriately would take a least another full litigation discovery post or a paper. As a result, I will leave the detailed discussion of these matters to another time, but for this discussion I will focus on searches used to identify potentially relevant documents for purposes of matter assessment (i.e., understanding the nature of the case: who did what, where, when and why) and for document production to the opposing party.

I have observed five major characteristics of e-discovery search that as a whole differentiate it from other searches. I would be interested to hear additional views on what is different about e-discovery search, so please comment on this post.

Recall
First, the cost of missing a relevant document, or low recall, can be very high in e-discovery. Missing a document that you should have produced could result in sanctions and adversely impact the case outcome. Missing key documents could also affect your legal strategy causing you to make sub-optimal decisions. Missing relevant documents can be costly in other searches as well. For example, in legal research, not identifying case law that is critical to your case could also have a detrimental impact on your legal strategy. However, low recall is on average costlier and more likely in e-discovery. In contrast to e-discovery and legal searchers, web search users are typically not very concerned with missing relevant documents. For the most part, they are interested in the most relevant documents, not all of the relevant documents. This is why Google rarely actually provides all the results for a search (you can try this yourself by paging to the end).

Precision
Second, the cost of returning false positives, otherwise known as low precision, in e-discovery searches is high. The results of e-discovery searches including false positives are typically produced and reviewed by humans at costs as high as several dollars per document. On the other hand, false positives have a minimal cost in web search because users either won’t see them if they are ranked low or will ignore them after minimal review. False positives can be costly during legal research in certain scenarios, such as when the stakes and nature of case are such that many search results need to be exhaustively reviewed, but typically the costs are lower.

Varied Language
Third, documents searched using litigation support software during e-discovery often include personal emails and files and frequently use varied language including jargon, slang, abbreviations, technical terminology, misspellings, and machine-created junk. This is Craig’s “Baghdad” point. In contrast, as Craig points out, documents searched during legal research, such as opinions, motions, etc. are typically well-structured documents with no misspellings, relatively consistent language etc. Even web sites are generally “cleaner” than typical e-discovery documents.

Complexity
Fourth, users are often looking for different information when performing searches during discovery. E-Discovery searches are often aimed at comprehensively understanding “who did what, when, where and why” in a matter where the people involved may be trying to hide this information and where there may be no single “starting point”. As a result, e-discovery searchers often adopt strategies that involve large numbers of queries, and will follow the evidence and iteratively refine their searches for combinations of topics, people, places, etc. Legal searches can also be fairly complex, but as with other differences this is one of degree. These searches typically don’t involve hundreds of queries and terms, are often more narrowly defined and have a “starting point”. Web searches tend to be even simpler. Most are one or two words.

Transparency
Finally, e-discovery search is part of a legal process. The searches themselves are subject to negotiation with and review by opposing counsel and the court. This process can also take place over long time frames. As such, there is a great need for transparency in the development and execution of e-discovery searches. It is also important for e-discovery searchers to develop a defensible audit trail to prove what searches were run and what results were produced when. This is not the case in web or legal research.

These differences have a number of implications for e-discovery search best practices, training, software and more. I will discuss these in more detail in future posts. However, I think these differences make clear why Craig and Ron are right to suggest that people who are new to e-discovery can benefit from specialized training and tools. Similarly for those of us who are deeply involved in e-discovery, I believe these differences point to the fact that there is still a lot of work to be done in developing best practices and software to make it easier for lawyers and other users to perform e-discovery searches effectively.

1 Ron also wrote another interesting post on this topic which can be found at PrismLegal.com.