Posts Tagged ‘TREC’

2009 TREC Legal Track Sheds Light on Search Efficacy in Electronic Discovery

Tuesday, July 27th, 2010

In one of my previous posts, I had discussed the value and importance of TREC to the legal community. Clearwell Systems has been a TREC participant for the last two years, and believes in working with the rest of the participants in advancing the collective knowledge of electronic discovery-related information retrieval methodologies. TREC’s work has been conducted in the context of annual workshops, and is organized in the form of specific tracks. For legal professionals, the TREC Legal Track is the most relevant, and track organizers have just released the much-awaited overview of the 2009 workshop. I will summarize the key results from the study and its broader implications.

The overview paper is now available and covers the design of the two tasks within the track – the Interactive task and the Batch task. The Interactive Task is very relevant for the legal community, since it is designed specifically for analyzing the task of producing specific records in response to a “discovery request”. As noted in the paper, 15 teams participated, including 10 commercial teams, up from three teams in 2008. The 2009 study was also the first time an email collection (based on Enron emails released by FERC) was used.

The Interactive Task involves a “mock complaint” and seven different topics, with each topic described in the form of a general information request. Several teams participated by choosing one or more topics and submitting responsive documents for each.  These were then assessed using a mathematically sound sampling and estimation methodology, and effectiveness metrics were computed for each team.

The critical summary measure is F1, a combination of precision (estimate of false positives) and recall (estimate of false negatives). Overall, the highest F1 measure achieved across six of the seven topics was very good, as evidenced by values from 0.614 to 0.840. As an example, an F1 measure of 0.840 was achieved with a Recall of 0.778 and Precision of 0.912. This implies that the information request was satisfied with very few false positives (8.8%) and few false negatives (22.2%). Having a high precision implies that your reviewers will be reviewing fewer irrelevant documents, hence reducing your review workload and review costs.  A high recall ensures that very few documents were missed, so your case teams can be confident that all the facts of the case are examined.

It’s always important to look not only at the results, but the costs incurred when achieving said results.  We can break this into the costs that each team incurred, and the costs that assessment and topic authorities incurred. Unfortunately, the study did not track the amount of resources each team expended, so we will have to leave that as a possible improvement for a future study. To get a view of the second cost, a review of the tabulation of team interactions with topic authorities (Figure 3 of the overview paper) is helpful. In this study, the topic authority plays the role of a case expert. The numbers show that for some topics, a highly acceptable F-measure (over 0.75) was achieved even with interactions of 100 minutes, well below the 600 minutes allocated for each team. This would indicate that the teams were able to understand and construct meaningful searches with very reasonable amount of involvement of a case expert.

The other interesting conclusion is that there is value in selecting a corpus containing attachments. The study found that attachments increased the value of responsiveness by measuring the “document to attachment” ratios. For the responsive set, this ratio was a significantly higher value of 4.8 (i.e., responsive document families had, on average, one message and 3.8 attachments), while the entire population had this ratio at 2.2. This suggests that using the Enron corpus that contained attachments was a very good decision.

Of course, the most revealing, controversial finding is with respect to the Assessment and Adjudication phase of the project. As noted in the overview paper’s section 2.3.4, the rate of success of appeals was significant, ranging from 82% to 97%. In other words, the initial sample assessments were reversed in an astonishingly large number of cases. One could argue that the appealed documents were carefully selected, but that argument is weakened by the varying number of appeals by participating teams, and the success rate for even the teams with larger number of submissions. As noted in the paper, the teams that invested greater amounts of resources in the appeals phase benefited proportionately in the levels of improvement of their final precision and recall numbers. I know that constructing appeals can consume a lot of resources since, in addition to the normal information retrieval task, you are required to provide a convincing argument for reversing an initial judgment. This becomes very much a review exercise, not unlike the traditional manual review that the broader legal industry has been struggling with. For example, our own appeals budget was limited, forcing us to sample the appealed documents and select only a few. The outcome of this is that un-appealed documents are all assessed as relevant, which is unsubstantiated by the large number of appeals. In the final analysis, section 2.4.2 illustrates a salient indicator of success – teams that had a positive and useful interaction with the topic authority had the greatest success of initial assessments as well as success in appeals, and the ones that leveraged this for the greatest number of appeals had reported the greatest F-measure.

The 2009 study saw a significant increase in participation from commercial teams. My own personal observation is that unlike academic teams, commercial teams tend to evaluate their participation in TREC projects through the narrow prism of short-term return on investment. While there is value in contributing to the community, I am sure each team is asked to justify the benefits of participation to their management. Some would argue that the full benefit is not realized because of the restrictions placed on dissemination of results within the broader community, especially in the area of marketing the results. I am sure every commercial participant would want to promote their performance, and highlight how their technology and methodology was superior. Given that such direct comparisons are not permitted, the ability to market your results is severely curtailed. The potential for comparative analysis could be a powerful motivator for all participating teams to invest more in the exercise, with the final outcome that the community benefits.

As I noted in my previous post, the legal e-discovery profession needs an independent authority that can challenge vendor claims and provide objective validation of one of the most complex areas of e-discovery – search and information retrieval. TREC has stepped in and served that need very effectively. And, this has been deservedly noticed by the people that matter – Justices of cases involving electronic discovery, expressing their opinions regarding “reasonableness” with respect to cost-shifting, adverse inference, motion to dismiss and other judgments.

A study of such magnitude is bound to have certain flaws, and these are documented in Section 2.5. Leaving aside these shortcomings, the TREC Legal Track effort is immensely useful for both participants and consumers/users of legal technologies and services. The value offered to the community by such studies is well captured in the companion report, titled the Economic Impact Assessment of NIST’s TREC program. As the TREC coordinators are rolling out their new TREC 2010 Legal Track tasks, it is obvious that continued improvements in both the design and execution will make it even more attractive for all participants. Clearwell Systems is committed to the overall goals of TREC and intends to continue their involvement in the TREC 2010 Legal Track projects.

Better Search for E-Discovery

Tuesday, March 11th, 2008

I spend a lot of time researching and developing new search functionality, and working with enterprises and law firms to use this functionality to improve their e-discovery outcomes. To this end, I have followed the excellent research performed as part of the TREC legal track. I also recently attended an informative Sedona Conference webinar on “Search and Information Retrieval”, which contained a section on Information Retrieval (IR) Lessons for E-Discovery presented by Ellen Voorhees of NIST.

As I described some of this research to a colleague of mine, he asked me “So, what’s the so what? What’s the most important step our customers can make to improve the way they search in e-discovery matters based on your work with customers and this research?” My answer was a little surprising even to me. While good cases can be made for looking at concept search and newer, more automated ways of performing content analysis, I believe the most important step that customers can take is simply for them to get their “experts” to start iteratively searching the data in a matter as early as possible in a matter. Let me explain.

When I look at Ellen’s presentation and the findings from the TREC legal track 2006 research overview three findings stand out to me:

  1. If you want to get more effective results as measured by “recall” (i.e., how many of the relevant documents did you find?) and “precision” (i.e., how many of the documents you found were relevant versus false positives), then the best way to achieve this is to write a better search query.
  2. One of the best ways to get better search queries is to commit human resources to improving them, by putting a “human-in-the-loop” while performing searches.
  3. The more expert the human, the better results you are going to get.1

In other words, what Ellen and the other researches have found that is that you get better results if the same person is running searches, evaluating the results, refining those queries, and trying again. The more expert the person, the better results you are going to get.

Now, you may be thinking that this sounds like common sense and I would completely agree with you. However, while this advice is clearly common sense to you and me, in my experience, it is not always followed in our industry. Instead, it’s all too common that at the beginning of a matter someone comes up with set of keyword queries, someone else runs these queries, some other people perform a detailed review of the results and then finally the “expert” or attorney at the end of this process reviews the most important documents and/or a summary of the documents written by someone else. At this point, some new queries may be developed based on the results of the review and the process starts over again.

What’s the problem with this approach? While in the end this approach can be effective, it can be exceedingly costly and time consuming. Instead, getting your “expert”, whether this is inside counsel, outside counsel, a subject matter expert, a litigation support professional, or a hired investigator, to interact with the data will allow you to find the most important information faster enabling you to make critical legal decisions faster and to dramatically reduce the cost and risk associated with e-discovery.

So why don’t more people follow the common sense advice of getting an expert in front of the data experimenting with queries, interacting with the data and developing better queries? In my view, the single biggest reason is that the technology used to perform searches for e-discovery has simply not been easy enough for legal experts to use. As a result, these experts have got used to developing queries without using technology, and not iteratively interacting with the data over a short period of time.

But that’s changing. In the past few years, several intuitive e-discovery solutions have come to market that enable non-technical lawyers to run their own queries. More and more law firms and enterprises are leveraging these solutions to move to “human-in-the-loop” searching. The results are striking: better early case assessment, much shorter turnaround times, lower costs, and more accurate results.

1 This is my simplified interpretation of the findings of the TREC legal track. What was found was that an expert manual searcher performed well relative to other non-expert manual run results. Baron, J., Lewis, D., and Oard, D. ”TREC-2006 Legal Track Overview.” The TREC research also contained other findings not covered in this post and I recommend reading the full document so that readers can draw their own conclusions.