Archive for the ‘cull-down’ Category

Kroll Ontrack and Iron Mountain Stratify Demonstrate That “Free” Is Usually NOT The Cheapest Solution For Electronic Discovery

Tuesday, June 1st, 2010

Every car dealer knows he should focus customers on the monthly payment, not the total cost of the car. Every credit card solicitation (or sub-prime mortgage, for that matter) starts with the offer of 0% interest, not the actual interest rate or fees the customer will pay after the first 6 months. The reason is simple: once you lease the car or put a balance on the credit card, it’s very hard to switch away when – as often happens – you find yourself paying much more than you should later on.

I was reminded of these examples when reading about Kroll Ontrack’s offer of “free ECA” and Stratify’s recent press release announcing “free early stage filtering” for electronic discovery. Taking each in turn:

Kroll Ontrack Advanceview

Based on feedback from several customers in Washington DC, New York, and the Mid-West, Kroll Ontrack often provides Advanceview at no charge. That means customers can get “custodian de-duplication” and “1 keyword and date filter pass” for free, although Kroll still charges $200-250/hour for doing the work. The resulting data set is then processed and loaded into its review platform for $1,500-$1,800 per gigabyte.

Is this a good deal? For the vast majority of customers, the answer is “no” for three reasons.

First, customers typically end up paying more than they would using alternative products. For example, in the chart below, we compare the cost of using Kroll Ontrack to that of Clearwell for a 100 gigabyte project. In both cases, we assume customers are doing de-duplication, filtering, keyword searching, first pass review, and load file creation. As with any comparison of this sort, you have to make some simplifying assumptions. For example, we excluded data hosting fees and professional services fees from the analysis.

Whether customers are better off with Kroll depends entirely on how much data is culled out for free before customers incur the high, back-end charges. Given that all Kroll is doing for free is custodian de-duplication and running one set of keywords and date filters, the typical cull rate is likely be anywhere from 20% to 50% — nowhere near the 80% cull rate required for Kroll to be more cost effective than Clearwell.

The second reason why this is not a good deal is that it gives customers no certainty about costs. Culling rates from de-duplication and blind keyword searches are unpredictable and vary widely, meaning that some projects will cost more than expected while others will cost less. But every project has budget that’s determined up front and, as any litigation support manager will tell you, you get much less credit for being under budget than you get pain for going over budget. That’s why cost certainty is one of the leading requests from anyone involved in electronic discovery.

Finally, excluding data based on a single round of keyword searches and date filters is not in line with The Sedona Conference best practices. Rather, Sedona recommends that customers iterate their keywords and culling strategies to hone them appropriately.

Iron Mountain Stratify OnPoint

It is not yet possible to do the same detailed analysis on Stratify’s OnPoint which offers “free early stage filtering”, because it’s impossible to tell exactly what that means. In its artfully-worded press release and data sheet, Stratify promises to provide “free processing and loading of unlimited data for early stage filtering”. Does that include de-duplication? Does that include any keyword searching? My guess is “no”, in which case all they are really doing for free is offering to load data into their review platform so that they can then charge you – not a very compelling offer. But if anyone does know the answer to these questions, or if Stratify would like to clarify exactly what’s being offered for free, then please let me know and I’ll post an update.

Once data is in Stratify’s system, it charges a “one-time fee starting at $500 per gigabyte” for “reviewable data”. But it does not say if that’s the only fee. What about monthly hosting charges? Fees for additional reviewers? Again, it’s not yet clear what the downstream cost of review really is using Stratify, so it’s impossible to know whether this is a good deal.

If there’s one lesson from all of this, it’s “buyer beware”. Just as when you buy a car, sign up for a credit card, or click on that offer to get more corn on Farmville, you need to look beyond the “free offer” and understand what it’s really going to cost you.

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.

Time to Work Together on Electronic Discovery

Friday, February 27th, 2009

Cheesy Successories posters aside (for an alternative take, go here), the need to work together is much more than just a cliché in today’s environment.

In its recent brief on the five major trends that will shape business technology in 2009, leading management consultancy McKinsey and Company noted one trend in particular which highlights the urgent need for an organization’s IT and legal groups to forge better, faster, and more efficient ways of collaborating on electronic discovery issues:

Regulators demand more from IT

Government scrutiny of business will intensify in many developed countries. Already, in the United States, the Office of the Comptroller of the Currency weighs in on the resiliency of banking systems, the Food and Drug Administration (FDA) requires that many pharmaceutical systems be “validated,” and Sarbanes-Oxley drives decisions about accounting systems in every industry. In the future, policy makers and regulators will probably demand that IT systems capture more and better data in order to gain greater insight into and control over how banks manage risk, pharma companies manage drugs, and industrial companies affect the environment. Government officials also will monitor many legal and business rules more closely to ensure compliance with mandates. Successful CIOs should enhance their relationships with internal legal and corporate-affairs teams and be prepared to engage productively with regulators. They will need to seek solutions that meet government mandates at manageable cost and with minimal disruption.

- McKinsey Quarterly, February 2009

The current economic environment is creating a “Double Whammy” within almost every enterprise that has ongoing or pending electronic discovery issues (and are there many organizations left out there that don’t?):

  • As the McKinsey article notes, regulators will increasingly be demanding more from IT as government scrutiny of business intensifies. Just look at the just-launched recovery.gov site to see the level of transparency and accountability that the government is aiming for with regard to the stimulus package. The bailout will not directly affect every business, but there is a new sheriff in town who will likely set the tone across the entire business landscape.
  • At the same time, there is relentless pressure on controlling costs. When times are tough, dollars that can be saved on the expense side are much more valuable that top-line revenue, since 100% of every dollar of cost savings goes directly to the bottom line.

The net-net: Enterprises will be forced to do more, with less.

How? With regard to electronic discovery, there is a lot of low-hanging fruit to be picked in the area of IT and legal cooperation:

  • In-house legal teams should meet with IT (if they aren’t already) to help them better understand the nature of electronic discovery, particularly as it applies to the more “upstream” parts of the process (specifically, identification, preservation, and collection) which IT tends to be more responsible for. Through a better understanding of the nature of electronic discovery, IT can improve its ability find the right documents, avoiding over-collection and reducing downstream processing costs. In addition, new electronic discovery technologies are making it increasingly easy for legal to own more of the process, reducing the electronic discovery burden on IT.
  • Conversely, IT should coordinate with in-house legal teams to provide advice and mentoring as legal seeks to bring e-discovery platforms in-house to assist with early case assessment, search, culling, and analysis. To many legal teams, bringing e-discovery in-house may seem like a daunting proposition, but enterprise software has been around for a long time, and learning from IT’s experiences can make the process far less intimidating.

Yes, regulators are going to be far more demanding in the future than they have been in the past. But some simple collaboration and coordination between IT and legal will go a long way toward lightening the regulatory burden, especially as it pertains to electronic discovery.

E-Discovery 911: Reducing Enterprise Electronic Discovery Costs in a Recession

Friday, February 20th, 2009

In today’s economy, controlling electronic discovery costs has taken on a new urgency.  Because the financials of many companies have deteriorated so quickly, there is great interest in finding methods to reduce any costs in the short-term.  As  a result, anyone in a company’s IT or legal department that comes up with a plan to substantially reduce their company’s electronic discovery costs in the short-term is likely to become a hero in their company.  So, what’s the best way to reduce electronic discovery costs quickly?

A natural first step is to decide where to focus.  Which electronic discovery activities are the most costly today?  Which have the greatest room for cost reductions?  The EDRM model serves as a good guide for answering such questions by breaking electronic discovery activities into Information Management, Identification, Collection, Preservation, Processing, Analysis, Review, Production and Presentation.  One thing I have noticed when interacting with enterprises is that the IT and legal departments tend to focus on different stages within electronic discovery based on their perspective.  IT managers naturally concentrate on the information management, identification, collection and preservation activities because these are the activities in which they are most involved.  Similarly, legal managers naturally look to preservation, processing, production and review.

Given these different perspectives, it’s important to take an objective approach to calculating electronic discovery costs.  Doing so is not that easy.  Costs can vary significantly depending on each company, the nature of the case, nature of the data, which vendors/technologies that are used and a variety of other factors.  Costs also come in many different forms: direct hard dollar costs, such as spending on legal discovery and electronic discovery fees delivered by third parties; indirect hard dollar costs, such as time spent by company employees; and soft dollar costs, such as increased risk that could lead to adverse judgments and sanctions.  Finally, electronic discovery costs are often buried across both legal operating budgets and IT budgets making it hard to separate these costs from the costs of other activities.

Undertaking an internal analysis to understand your company’s electronic discovery costs is a valuable activity if you want to better control these costs.  However, while costs do vary between companies, most companies will find that the same activities contribute the most direct hard dollar costs and that these are the costs that are easiest to control in the short-term.  To demonstrate this, let’s walk through a generic cost analysis of a typical case.  Fortunately, we don’t have to start from scratch in doing this.  Leonard Deutchman, an author of several excellent electronic discovery articles, has already done most of the work in a May 2007 article, “Get Ready for the Rules Changes, Part VIII“.  In this article, Mr. Deutchman walks the reader through a hypothetical litigation between an Investor and a Venture Capital firm.  He describes the typical electronic discovery activities and calculates the direct hard dollar costs for these activities including:

  • Collection: Mr. Deutchman calculates that it costs $10k to collect 400GB from 8 hard drives and the data of 8 custodians on file and email servers using an outside vendor (doing it in-house can be less expensive).  Note that this excludes any collection from back-up tapes, which can be more costly.
  • Culling & Processing: it costs $4k to reduce the 400GB to 90GB by removing non-relevant file types prior to processing.  Processing 90GB costs $90k at $1000/GB.  De-duplication and the application of search terms reduce the data to 25GB.
  • Production: it costs $4k to produce the 4GB of data that is deemed responsive and not privileged to produce to the other side.

Mr. Deutchman doesn’t identify direct hard dollar costs for Information Management, Identification or Preservation.  These activities are typically not associated with direct hard dollar costs on a per matter basis.  Rather, they involve indirect hard dollar costs such as employee time and software licenses.  Mr. Deutchman also does not provide an estimate for the costs of review.  However, since review does contribute significant direct hard dollar costs for every matter, this gap needs to be filled in order to get a complete sense of the direct hard dollar costs.  The two big buckets of cost in review are: attorney review costs and review software costs.  In Mr. Deutchman’s hypothetical litigation one might imagine the following scenario for these costs:

  • 25GB translates into 195,000 documents using the low end of the documents per GB email (9,000/GB) and documents per GB files (7,000/GB). Industry survey data that is available from EDRM.  This example assumes that 40% of the 25 GBs is email.
  • The attorneys reviewing the data charge $75/hour and make 100 document decisions per hour.  This translates to approximately $146,000.
  • The hosted review service costs $50/GB/month and, in this case, let’s assume we host it for 6 paid months.  This costs $7,500.

If we tabulate these costs and calculate the direct hard dollar cost shares for each stage, the clear take-away is that Processing and Review costs comprise the vast majority of direct hard dollar costs.  Collection and Production direct hard dollar costs are significantly smaller in comparison.

EDRM Stage

Hard Dollar Costs ($k) Share
Collection 10 4%
Processing 94 36%
Review 153 58%
Production 4 2%
Total 261 100%
Total for Processing & Review 247 94%

Now, it’s possible to come up with many arguments for why Mr. Deutchman or my estimates could be high including different assumptions for attorney hourly review costs, higher document decision rates, cheaper vendor pricing, etc.  Similarly, it’s possible to come up with many arguments for why the estimates could be low including the need to perform multiple review passes, slower document decision rates, more expensive vendor charges, etc.  In addition, each company will have their own unique circumstances that will change this picture.  However, this generic analysis strongly suggests that more customized analyses would come to the same conclusion: if you want to reduce electronic discovery costs quickly, then you need to focus on processing and review costs.  One can also imagine that even if you were to use some form of activity-based costing to allocate indirect hard dollar costs on a per matter basis, it would likely not change the importance of Processing and Review costs.

What does this mean for IT and legal managers in Corporations?  These kinds of analyses make it pretty clear that, even though they are more involved in the Information Management, Identification, and Collection phase of electronic discovery, IT managers need to focus more on helping the legal team optimize Processing and Review activities.  You are not going to get the biggest bang for your buck in the short-term by trying to reduce costs in Information Management, Identification, Preservation, and Collection.  Similarly, legal managers need to work more closely with IT in order to focus on how to reduce processing and review costs.

So, the obvious question coming out of such an analysis is what’s the best way to reduce Processing and Review costs?  We’ll discuss this issue in a future post.

In the meantime, tell me what you think by participating in our first e-discovery 2.0 poll.  See the sidebar here: Which Phase of Electronic Discovery Do You Think is the Most Costly?

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in electronic discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

“Aggressive Culling”: The E-Discovery Buzz Cut

Tuesday, September 30th, 2008

Ralph Losey, never one to mince words, recently analyzed a recent litigation survey from the elite Fellows of the American College of Trial Lawyers. The survey highlights the fact that one of the main problems facing the U.S. legal system today is (surprise!) e-discovery. Also (not) a surprise is that the study “places the blame squarely on poor rules, bad law, and judges”, while overlooking the role that lawyers play in the problem.

In his analysis, Ralph makes a number of insightful observations that should help lawyers move from being e-discovery troublemakers to being part of the solution. However, one of his key critiques is targeted not at lawyers but rather at the vendor community: “[E-discovery] is too expensive because lawyers and judges do not know what they are doing, and do not know how to properly cull and review email, and because clients are disorganized pack-rats. Many of the e-discovery vendors are also misinformed, but often they do know better; they just have no pecuniary interest in aggressive culling. Some may even seek to line their own pockets in inflated discoveries.”

As Ralph bluntly points out, pecuniary interests (translation: money) plays a big role here, but so does risk reduction. Imagine you’re given the opportunity to process a 2 terabyte case all the way through to review. With the “funnel” of e-discovery costs placing the highest dollar per gigabyte value on the end of the process (i.e. review), what’s your incentive to cull aggressively at the beginning? Not much from a revenue perspective, certainly, but also not much from a risk perspective: particularly when you have sanctions and lawsuits on your mind and are thinking about the potential liability that you incur by excluding potentially relevant documents by using too broad a brush (or pair of garden clippers) in your pruning.

How do we move forward? As document volumes continue to grow, it’s clear that aggressive culling (with a few caveats which we’ll get to in a minute) is a critical tool for managing costs and improving case outcomes (let’s go out on a limb and define “improving” as producing fairer and more equitable rulings). However, in order to adopt more aggressive culling as a standard part of the electronic discovery process, the community has to come to terms with three things:

  • The Myth of Perfection: There may be perfect abs, but there is no perfect e-discovery. Organizations like the E-Discovery Institute are doing fantastic work to measure and improve the accuracy of electronic discovery efforts, but in the end it’s tough to make the argument that having 100 contract attorneys manually reviewing 10 million documents will necessarily produce a better overall e-discovery outcome than  10 specialized attorneys reviewing 200,000 documents that were aggressively (but thoughtfully) culled from initial 10 million document set. There simply is no black and white set of rules that will lead to a perfect process.
  • The Benefit of Cost Control: Given that, it is in the best interest of everyone involved (yes, even vendors) to choose the most cost-effective process that provides a high likelihood of producing the information relevant to the case.  This means “saving your bullets” by not spending all of your e-discovery dollars up front in a case pursing the perfection myth, but instead approaching discovery in an incremental fashion which can adapt to changing facts and circumstances as the matter unfolds. How, you may ask, do vendors benefit? They can become more strategic e-discovery advisors by working with counsel over the full lifecycle of a case, providing higher-value (and, by the way, more interesting and intellectually challenging) consulting services to help incrementally adjust and adapt the course of e-discovery. As Ralph puts it: “…Trial lawyers should accept that specialists in the field of e-discovery are a necessary evil. If an e-discovery specialist knows the field, they can save you money and take you out of the e-discovery morass faster and more reliably than a dozen new rules. The world today is too complex for one man or woman to do it all.”
  • The Value of Defensibility: Many of you likely winced at the term “high likelihood” in the previous point. “Sacrilege!” you cried. “I demand certainty!” First, go back and re-read the first point about the Myth of Perfection. Then, consider that a better way forward may be an approach to e-discovery that involves more aggressive culling early in the process to focus on the most important documents first, more iterations to adapt to changing facts and circumstances, and, all along the way, a complete audit trail that provides defensibility in the event that any aspect of the process is ever questioned. Such defensibility would include specific documentation about the culling decisions that were made, down to the keyword and “sub-keyword” (i.e. wildcard expansion) level, so all the cards are on the table for everyone to see.  The value of defensibility when performing aggressive culling is enormous, in that it adds an additional measure of safety and trust to the process, minimizing the amount of doubt and second-guessing that so often plagues e-discovery negotiations.

By coming to terms with the fundamental imperfections of the e-discovery process and embracing the promise of lower costs and the agility and responsiveness that can be gained with a more iterative approach, everyone stands to gain from the safe and controlled adoption of aggressive culling – yes, even the vendors (at least the smart ones) and their ever-present pecuniary interests.