Posts Tagged ‘transparency’

Defensible E-Discovery a Hot Topic at the Masters Conference

Thursday, October 29th, 2009

Recently, I moderated a panel at the Masters Conference with John Loveland, Sonya Thornton, and Bruce Markowitz entitled: How Defensible is Your E-Discovery Process? (Click here to read a summary of the panel.) It was well attended, and I think that the draw (aside from the esteemed panel) was that this topic still remains very vexing for most practitioners.

Initially, we started at ground zero with the notion that defensibility is in most instances equated with the “reasonableness” standard, which is pervasive across many areas of the EDRM spectrum… from preservation to production.  Instances include:

  • Preservation — “[a]s soon as a potential claim is . . . identified, a party is under a duty to preserve evidence which it knows, or reasonably should know, is relevant to the future litigation.”
  • FRE 502 (b) – the disclosure does not operate as a waiver in a Federal or State proceeding if the (2) the holder of the privilege or protection took reasonable steps to prevent disclosure;
  • General Privilege Waiver — In SEC v. Badian, 2009 WL 222783 (S.D.N.Y. Jan. 26, 2009)(link), “there is no basis … to conclude that there were precautions [to prevent the disclosure], let alone whether they were reasonable.”
  • FRCP 37(e) — Absent exceptional circumstances, a court may not impose sanctions under these rules on a party for failing to provide electronically stored information lost as a result of the routine, good-faith operation of an electronic information system.

While the foregoing isn’t exhaustive it does highlight the persistent nature of the reasonableness standard as practitioners seek a defensibility sanctuary.  The good news is that the law doesn’t require perfection and there are also a number of ways to obtain reasonable defensibility:

  • Demonstrable acceptance by the opposition – here the notion is that collaboration with the opposition allows the parties to comfortably move ahead with their discovery process and even if it’s not objectively reasonable, the parties consent to the protocol will in most instances carry an imprimatur of reasonableness.
  • Auditing / process transparency.  Similar to the first bullet, auditing the process and giving the opposition visibility into the process steps will often make it hard for them to lodge successful downstream challenges.
  • Adherence to Local Rules (See 7th Circuit Pilot Program) or judicial order.  Another avenue than can provide some degree of safety is compliance with a discovery protocol mandated by local rules, although that compliance may ultimately be challenged.
  • Statistical confidence intervals / sampling – the use of statistics as a way to bolster process defensibility is starting to come to maturity and in the future I think that detailed precision, recall and other statistical indicates will play a large role in e-discovery defensibility.

None of these steps can be guaranteed to really get you off the hook from a rapid opposing party calling foul, but using them in a “belt and suspenders” fashion will certainly help buttress any discovery process.

For more illumination on the topic please see the following video of my interview with John Loveland, who’s waxing poetically about discovery defensibility.

A Gross Inability to Craft Electronic Discovery Searches

Thursday, April 9th, 2009

The bashing of our judicial system seems to have reached a fevered pitch.  Groups like the American College of Trial Lawyers (”ACTL”) have proclaimed in a recent report that while the “civil justice system is not broken, it is in serious need of repair.”  The blame game seems to have judges and attorneys alike pointing fingers.  The Fellows of the ACTL (perhaps not surprisingly) seems to pin some of the blame on the judiciary:

“Judges should have a more active role at the beginning of a case in designing the scope of discovery and the direction and timing of the case all the way to trial. Where abuses occur, judges are perceived not to enforce the rules effectively.”

Groups like the Sedona Conference chalk up many of the ills to the failure to cooperate, so much so that they’ve orchestrated a cooperation proclamation – which has picked up enough support by the bench to have garnered several cites in the case law (see e.g., Mancia).

The bench for its part seems to put some of the onus on litigators and their reticence to get with the times.  William A. Gross. Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 2009 WL 724954 (S.D.N.Y. Mar. 19, 2009) is the latest example of such a proclamation.  In this construction defect case, Judge Peck (a Sedona devotee) issues what he hopes will be a “wake-up” call to the bar about the need for “careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information (‘ESI’).”  In Gross, the court had to mediate an e-discovery dispute where the requesting party propounded a blatantly over-inclusive search request crafted by the requesting parties.  Unfortunately, the responding entity was a non-party and they simply dig their heads in the sand.  In order to facilitate a resolution this left the Court in the “uncomfortable position” of having to craft a “keyword search methodology for the parties, without adequate information from the parties (and Hill).”

Judge Peck’s exasperation with these antics was palpable.  Summing up the problem by citing Judge Grimm and Victor Stanley he stated: “This case is just the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails.”  He further noted: “[w]hile this message has appeared in several cases from outside this Circuit, it appears that the message has not reached many members of our Bar.”

After noting both Sedona and Judge Facciola (of O’Keefe and Equity Analytics fame) Peck’s opinion reached a crescendo:

“Electronic discovery requires cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI. Moreover, where counsel are using keyword searches for retrieval of ESI, they at a minimum must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives.’ It is time that the Bar-even those lawyers who did not come of age in the computer era-understand this.”

While it’s easy to see who Peck blames in this brouhaha, it takes (at least) two to tango.  Meaning that litigants on both sides of the “v” must move beyond the typical “seat of the pants” electronic discovery wrangling.  And, judges need to be savvy enough to spot the issues to help/force the parties into such an enlightened/cooperative state.  Nothing short will get the job done.

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in e-discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

What’s Different About E-Discovery Search?

Monday, May 5th, 2008

raiders-warehouse.jpgIn his latest article, Craig Ball argues that lawyers “need to learn more about the science of search.” Craig says that at least part of the reason for this is that searching in e-discovery is challenging and different from the searching to which lawyers are accustomed.

“Lawyers believe themselves adept at keyword search in e-discovery because they’ve mastered keyword search in online legal research. The correlation is superficial at best. Unlike the crazy quilt of ESI, the language of reported cases is precise, consistent and structured. Misspellings are rare. Legal research is Disneyland. E-discovery is Baghdad.”

I had a conversation on a similar litigation discovery topic with Ron Friedman last month after my last post where he made a similar argument about lawyers needing to learn e-discovery search tools.1

I think Craig and Ron make excellent points. E-Discovery using litigation support software search is different and it’s important for lawyers, investigators, litigation support professionals and other practitioners to understand how. The natural questions that arise from their arguments are: what is different about e-discovery search? How is it different from other familiar searches, such web search and legal research search? The answers are important because it can help guide e-discovery experts on how to train lawyers and even guide attorneys during litigation discovery review. It is also important for developing e-discovery best practices and e-discovery search software.

I think the first step in answering these questions is to agree on the definition of e-discovery search, or better said the types of e-discovery search since there are several. To address this appropriately would take a least another full litigation discovery post or a paper. As a result, I will leave the detailed discussion of these matters to another time, but for this discussion I will focus on searches used to identify potentially relevant documents for purposes of matter assessment (i.e., understanding the nature of the case: who did what, where, when and why) and for document production to the opposing party.

I have observed five major characteristics of e-discovery search that as a whole differentiate it from other searches. I would be interested to hear additional views on what is different about e-discovery search, so please comment on this post.

Recall
First, the cost of missing a relevant document, or low recall, can be very high in e-discovery. Missing a document that you should have produced could result in sanctions and adversely impact the case outcome. Missing key documents could also affect your legal strategy causing you to make sub-optimal decisions. Missing relevant documents can be costly in other searches as well. For example, in legal research, not identifying case law that is critical to your case could also have a detrimental impact on your legal strategy. However, low recall is on average costlier and more likely in e-discovery. In contrast to e-discovery and legal searchers, web search users are typically not very concerned with missing relevant documents. For the most part, they are interested in the most relevant documents, not all of the relevant documents. This is why Google rarely actually provides all the results for a search (you can try this yourself by paging to the end).

Precision
Second, the cost of returning false positives, otherwise known as low precision, in e-discovery searches is high. The results of e-discovery searches including false positives are typically produced and reviewed by humans at costs as high as several dollars per document. On the other hand, false positives have a minimal cost in web search because users either won’t see them if they are ranked low or will ignore them after minimal review. False positives can be costly during legal research in certain scenarios, such as when the stakes and nature of case are such that many search results need to be exhaustively reviewed, but typically the costs are lower.

Varied Language
Third, documents searched using litigation support software during e-discovery often include personal emails and files and frequently use varied language including jargon, slang, abbreviations, technical terminology, misspellings, and machine-created junk. This is Craig’s “Baghdad” point. In contrast, as Craig points out, documents searched during legal research, such as opinions, motions, etc. are typically well-structured documents with no misspellings, relatively consistent language etc. Even web sites are generally “cleaner” than typical e-discovery documents.

Complexity
Fourth, users are often looking for different information when performing searches during discovery. E-Discovery searches are often aimed at comprehensively understanding “who did what, when, where and why” in a matter where the people involved may be trying to hide this information and where there may be no single “starting point”. As a result, e-discovery searchers often adopt strategies that involve large numbers of queries, and will follow the evidence and iteratively refine their searches for combinations of topics, people, places, etc. Legal searches can also be fairly complex, but as with other differences this is one of degree. These searches typically don’t involve hundreds of queries and terms, are often more narrowly defined and have a “starting point”. Web searches tend to be even simpler. Most are one or two words.

Transparency
Finally, e-discovery search is part of a legal process. The searches themselves are subject to negotiation with and review by opposing counsel and the court. This process can also take place over long time frames. As such, there is a great need for transparency in the development and execution of e-discovery searches. It is also important for e-discovery searchers to develop a defensible audit trail to prove what searches were run and what results were produced when. This is not the case in web or legal research.

These differences have a number of implications for e-discovery search best practices, training, software and more. I will discuss these in more detail in future posts. However, I think these differences make clear why Craig and Ron are right to suggest that people who are new to e-discovery can benefit from specialized training and tools. Similarly for those of us who are deeply involved in e-discovery, I believe these differences point to the fact that there is still a lot of work to be done in developing best practices and software to make it easier for lawyers and other users to perform e-discovery searches effectively.

1 Ron also wrote another interesting post on this topic which can be found at PrismLegal.com.