Archive for the ‘search’ Category

Kroll Ontrack and Iron Mountain Stratify Demonstrate That “Free” Is Usually NOT The Cheapest Solution For Electronic Discovery

Tuesday, June 1st, 2010

Every car dealer knows he should focus customers on the monthly payment, not the total cost of the car. Every credit card solicitation (or sub-prime mortgage, for that matter) starts with the offer of 0% interest, not the actual interest rate or fees the customer will pay after the first 6 months. The reason is simple: once you lease the car or put a balance on the credit card, it’s very hard to switch away when – as often happens – you find yourself paying much more than you should later on.

I was reminded of these examples when reading about Kroll Ontrack’s offer of “free ECA” and Stratify’s recent press release announcing “free early stage filtering” for electronic discovery. Taking each in turn:

Kroll Ontrack Advanceview

Based on feedback from several customers in Washington DC, New York, and the Mid-West, Kroll Ontrack often provides Advanceview at no charge. That means customers can get “custodian de-duplication” and “1 keyword and date filter pass” for free, although Kroll still charges $200-250/hour for doing the work. The resulting data set is then processed and loaded into its review platform for $1,500-$1,800 per gigabyte.

Is this a good deal? For the vast majority of customers, the answer is “no” for three reasons.

First, customers typically end up paying more than they would using alternative products. For example, in the chart below, we compare the cost of using Kroll Ontrack to that of Clearwell for a 100 gigabyte project. In both cases, we assume customers are doing de-duplication, filtering, keyword searching, first pass review, and load file creation. As with any comparison of this sort, you have to make some simplifying assumptions. For example, we excluded data hosting fees and professional services fees from the analysis.

Whether customers are better off with Kroll depends entirely on how much data is culled out for free before customers incur the high, back-end charges. Given that all Kroll is doing for free is custodian de-duplication and running one set of keywords and date filters, the typical cull rate is likely be anywhere from 20% to 50% — nowhere near the 80% cull rate required for Kroll to be more cost effective than Clearwell.

The second reason why this is not a good deal is that it gives customers no certainty about costs. Culling rates from de-duplication and blind keyword searches are unpredictable and vary widely, meaning that some projects will cost more than expected while others will cost less. But every project has budget that’s determined up front and, as any litigation support manager will tell you, you get much less credit for being under budget than you get pain for going over budget. That’s why cost certainty is one of the leading requests from anyone involved in electronic discovery.

Finally, excluding data based on a single round of keyword searches and date filters is not in line with The Sedona Conference best practices. Rather, Sedona recommends that customers iterate their keywords and culling strategies to hone them appropriately.

Iron Mountain Stratify OnPoint

It is not yet possible to do the same detailed analysis on Stratify’s OnPoint which offers “free early stage filtering”, because it’s impossible to tell exactly what that means. In its artfully-worded press release and data sheet, Stratify promises to provide “free processing and loading of unlimited data for early stage filtering”. Does that include de-duplication? Does that include any keyword searching? My guess is “no”, in which case all they are really doing for free is offering to load data into their review platform so that they can then charge you – not a very compelling offer. But if anyone does know the answer to these questions, or if Stratify would like to clarify exactly what’s being offered for free, then please let me know and I’ll post an update.

Once data is in Stratify’s system, it charges a “one-time fee starting at $500 per gigabyte” for “reviewable data”. But it does not say if that’s the only fee. What about monthly hosting charges? Fees for additional reviewers? Again, it’s not yet clear what the downstream cost of review really is using Stratify, so it’s impossible to know whether this is a good deal.

If there’s one lesson from all of this, it’s “buyer beware”. Just as when you buy a car, sign up for a credit card, or click on that offer to get more corn on Farmville, you need to look beyond the “free offer” and understand what it’s really going to cost you.

As the Electronic Discovery World Zurns

Wednesday, July 29th, 2009

Judge Grimm’s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.

“[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

Despite, legions of articles and blogs on the topic, at least certain portions of the bench haven’t taken heed.  In the case In re: Zurn Pex Plumbing Products Liability Litigation, 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter “Zurn“), U.S. District Judge Ann Montgomery receives points for understanding some basic e-discovery tenants around recall and precision, but then mysteriously goes where “angels fear to tread” by suggesting her own search terms.

Examining the case facts in more detail,…  Zurn is a class action products liability case where discovery was bifurcated (as is often the case – see Spieker v. Quest Cherokee) to first cover the class “certification” component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that “while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.”  However, the Magistrate didn’t shut the door entirely, suggesting that “should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.”

Despite complying with Sedona’s Cooperation Proclamation (”The parties have worked amicably throughout the discovery process”) opposing counsel still got to loggerheads when plaintiff found “voids” in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.

Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff’s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared “J” and “K” drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required “approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.”  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could’ve added another $400,000 for the project.

Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney’s representations about the resulting burden:

“It is unclear whether Zurn’s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians’ emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.”

The 361 gigabytes apparently resulted from “hits” corresponding to plaintiff’s 26 search terms.  The court correctly identified that those terms had precision issues (”many of Plaintiffs’ proposed search terms will likely produce a large number of ‘hits’ that have limited relevance in the case.”)

Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm’s warning and surprisingly took matters into her own hands: “the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.”  Here is the Judge’s list of keywords:

(1) AADFW,
(2) Corrosion,
(3) Corrosive,
(4) Corrosive Water,
(5) Crack,
(6) De-zinc,
(7) Dezincification,
(8) DZR,
(9) Fail,
(10) IMR,
(11) Leak,
(12) MES,
(13) SCC,
(14) Stress corrosion cracking

Without looking at the underlying data, it’s clear from the outset that Judge Montgomery didn’t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could’ve been captured by a single stemmed search using the term “corros*.” Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would’ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.

Judge Montgomery did recognize the potential error of her ways and gave the parties an out:

“The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.”

This “specific evidence” language seems to track notions from Sedona’s search best practices protocol, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless blindly proffer the 14 term search strategy.  Instead, she should’ve quoted Victor Stanley and required the parties to come up with a data driven approach that met requisite precision and recall metrics.

EDRM Continues Drive to Solve Practical Electronic Discovery Problems

Tuesday, June 23rd, 2009

As most electronic discovery veterans are aware, the EDRM Project is an effort founded five years ago by George Socha and Tom Gelbmann to bring together a community of e-discovery practitioners for the purpose of solving some of the industry’s most challenging problems.

It may be hard to believe, but there was time in the very recent past where the iconic EDRM model did not yet exist. No multicolored boxes, no arrows, no sloping volume and relevance lines — nothing. Coming up with a standard way of talking about electronic discovery was the first problem that the group set about solving, and I think it would be hard to argue with the fact that they came up with the gold standard: a simple, clear, concise model that, at least so far, is standing the test of time as a way of thinking about the flow of the e-discovery process.

With each passing year, the group has started to address a broader set of problems, all with a practical bent.  Currently, there are eight:

Project Goal
Evergreen Keep the EDRM model fresh and relevant as the industry grows and evolves
XML Provide a standard, generally-accepted XML schema to facilitate the movement of electronically stored information from one step of the e-discovery process to the next
Metrics Provide an effective means of measuring the time, money, and volumes associated with e-discovery activities
Code of Conduct Develop aspirational voluntary ethical guidelines for e-discovery providers and consumers
Search Provide a framework for defining and managing the various aspects of search as it applies to the e-discovery workflow
Data Set Compile a 100 gigabyte public data set that can be used to test various aspects of e-discovery software and services
Jobs Provide a professional resource for the e-discovery community and  communicate about e-discovery related jobs
Information Management Explore the emerging need for e-discovery standards in information management (the “upstream” part of the process)

This year’s annual EDRM conference took place back in May. After years of meeting in the same chilly and wind-swept location in downtown St. Paul, Minnesota, George and Tom had the brilliant idea of spicing up the meeting a bit by moving it to a more exotic locale: Bora Bora! Plans were set in motion, but quickly the overwhelming feedback came back from EDRM members: E-discovery is so fascinating, so heart-warming, that adding Bora Bora to the mix would simply be too much for the vast majority of the participants to bear. So St. Paul it was!

This was Clearwell’s third EDRM conference, and location aside, it’s been fascinating to see how it has changed over the last few years. Here are several notable trends from this year’s kickoff:

  • More participation from end-users: There was a definite increase in the number of end-user/consumer participants (that is, those not from the vendor community), particularly from law firms. This could be taken as further evidence that e-discovery is indeed moving in-house.
  • Increased enthusiasm to take on new challenges: One of the great things about EDRM is its willingness to try to tackle new areas that aren’t being directly addressed by some of the other (fantastic) organizations out there like Sedona. This was in evidence several years ago, when Clearwell was fortunate to get involved in the early stages of the EDRM XML project, which has proven to be a huge time, cost, and risk reducer for many in the industry by providing a common standard that can be used to move data within the e-discovery process. It was in evidence last year when Clearwell’s CTO was able to help launch a new effort around Search that is seeking to develop standards and best practices in an increasingly complex and contentious area. And, finally, it was in evidence this year with the launch of the Information Management project, a cutting-edge group that is exploring how to solve the challenges that e-discovery poses for information management – certainly a complex area in need of thought leadership.
  • Improved collaboration: One thing that has amazed us from day one is how collaborative EDRM is, and continues to become. There are a lot of e-discovery vendors involved who, outside of the confines of the St. Paul Hotel, aggressively compete in the marketplace. However, George and Tom have been able to create an environment at EDRM where competitive spirits are set aside and ideas can be cultivated which provide huge value across the e-discovery landscape (both vendor and consumer).

One final note: If you’re an e-discovery practitioner in a law firm or corporate setting, I’d encourage you to get connected, either informally (through the EDRM web site) or formally (by signing up for one or more of the projects). While end-user involvement continues to grow, there is definitely still a need for more non-vendor involvement. It is critical in ensuring real and relevant problems get solved, and to pushing the state of the art in e-discovery forward. Please join us!