Posts Tagged ‘Sedona Working Group’

Better Search for E-Discovery

Tuesday, March 11th, 2008

I spend a lot of time researching and developing new search functionality, and working with enterprises and law firms to use this functionality to improve their e-discovery outcomes. To this end, I have followed the excellent research performed as part of the TREC legal track. I also recently attended an informative Sedona Conference webinar on “Search and Information Retrieval”, which contained a section on Information Retrieval (IR) Lessons for E-Discovery presented by Ellen Voorhees of NIST.

As I described some of this research to a colleague of mine, he asked me “So, what’s the so what? What’s the most important step our customers can make to improve the way they search in e-discovery matters based on your work with customers and this research?” My answer was a little surprising even to me. While good cases can be made for looking at concept search and newer, more automated ways of performing content analysis, I believe the most important step that customers can take is simply for them to get their “experts” to start iteratively searching the data in a matter as early as possible in a matter. Let me explain.

When I look at Ellen’s presentation and the findings from the TREC legal track 2006 research overview three findings stand out to me:

  1. If you want to get more effective results as measured by “recall” (i.e., how many of the relevant documents did you find?) and “precision” (i.e., how many of the documents you found were relevant versus false positives), then the best way to achieve this is to write a better search query.
  2. One of the best ways to get better search queries is to commit human resources to improving them, by putting a “human-in-the-loop” while performing searches.
  3. The more expert the human, the better results you are going to get.1

In other words, what Ellen and the other researches have found that is that you get better results if the same person is running searches, evaluating the results, refining those queries, and trying again. The more expert the person, the better results you are going to get.

Now, you may be thinking that this sounds like common sense and I would completely agree with you. However, while this advice is clearly common sense to you and me, in my experience, it is not always followed in our industry. Instead, it’s all too common that at the beginning of a matter someone comes up with set of keyword queries, someone else runs these queries, some other people perform a detailed review of the results and then finally the “expert” or attorney at the end of this process reviews the most important documents and/or a summary of the documents written by someone else. At this point, some new queries may be developed based on the results of the review and the process starts over again.

What’s the problem with this approach? While in the end this approach can be effective, it can be exceedingly costly and time consuming. Instead, getting your “expert”, whether this is inside counsel, outside counsel, a subject matter expert, a litigation support professional, or a hired investigator, to interact with the data will allow you to find the most important information faster enabling you to make critical legal decisions faster and to dramatically reduce the cost and risk associated with e-discovery.

So why don’t more people follow the common sense advice of getting an expert in front of the data experimenting with queries, interacting with the data and developing better queries? In my view, the single biggest reason is that the technology used to perform searches for e-discovery has simply not been easy enough for legal experts to use. As a result, these experts have got used to developing queries without using technology, and not iteratively interacting with the data over a short period of time.

But that’s changing. In the past few years, several intuitive e-discovery solutions have come to market that enable non-technical lawyers to run their own queries. More and more law firms and enterprises are leveraging these solutions to move to “human-in-the-loop” searching. The results are striking: better early case assessment, much shorter turnaround times, lower costs, and more accurate results.

1 This is my simplified interpretation of the findings of the TREC legal track. What was found was that an expert manual searcher performed well relative to other non-expert manual run results. Baron, J., Lewis, D., and Oard, D. ”TREC-2006 Legal Track Overview.” The TREC research also contained other findings not covered in this post and I recommend reading the full document so that readers can draw their own conclusions.

E-Discovery Review Platforms: The Merits Of “Review Faster” vs. “Review Less”

Wednesday, January 23rd, 2008

ReviewersPerhaps the single greatest component of e-discovery costs is review, meaning the pain-staking process whereby teams of attorneys evaluate information to determine its relevance to the case at hand. Why has review become so expensive? A recent Sedona Working Group Paper explains:

In 1990, a typical gigabyte of storage cost about $20,000; today it costs less than $1 dollar. As a result, more individuals and companies are generating, receiving and storing more data, which means more information must be gathered, considered, reviewed and produced in litigation. But, with billable rates for junior associates at many law firms now starting at over $200 per hour, the cost to review just one gigabyte of data can easily exceed $30,000.

That’s quite a difference: $1 to store a gigabyte of data vs. $30,000 to review it; and it has driven corporate legal departments and law firms to embrace e-discovery review platforms. These review platforms, which can be either packaged software or a hosted service, typically emphasize one of two main benefits:

  • Review Faster”: Traditional review platforms increase attorney productivity by increasing the number of documents they can review each hour. For example, the name “Attenex” derives from the claim that it will help attorneys review documents “at 10x” the speed that they could do otherwise. These products help to a point, but – no matter how good the software – there is a limited number of documents that the human brain can digest in a day, so, even with them, review remains very expensive;
  • Review Less”: More recent e-discovery solutions have focused on having attorneys review fewer documents by culling down data prior to review. This can massively reduce review costs, since 80%+ of documents can be eliminated without being read, but it does raise one serious question: how can you be sure that responsive documents do not inadvertently get culled?

The technical term for this issue is “elusion”, meaning: out of all the material judged as not responsive, how many are in fact responsive (i.e., how many false-negatives does your culling methodology produce)? It is virtually impossible to answer that question definitively without a human reviewing the entire dataset to assess relevance which, of course, defeats the point of culling in the first place. So the accepted practice is to use statistical sampling theory, whereby you test a sample that gives you a certain confidence level about the total population. For example, to get a margin of error of 2-sigma with 95% confidence level, you need to randomly select and process one-in-400 documents. How easy is this to do? Actually, it’s pretty straight forward. Any good e-discovery solution should let you create a separate folder containing a subset of non-responsive documents for human review as a quick check on the effectiveness of culling. You can determine the size of your sample according to what confidence level you want to have.

This is an area that Sedona and others have considered in great depth, and there are many excellent papers on the subject by people far more knowledgeable than me. To pick just a few, Herbert Roitblatt has written extensively about sampling in e-discovery and elusion; and, Daticon’s paper may be a few years old, but is well worth reading to understand the origins of the “review less” movement.

Practically speaking, as someone who has seen both approaches in action, I think that “review faster” is helpful, but if you want to massively reduce your e-discovery costs, then the big win is “review less” – even with sampling to mitigate concerns about elusion.