Posts Tagged ‘de-duplication’

Clearwell Expands Its E-Discovery Platform with New Modules for Pre-Processing, Review, and Production

Monday, August 17th, 2009

Earlier today, Clearwell announced Version 5.0 of its e-discovery platform. Unlike prior versions which focused on processing, early case analysis, and first-pass review, this release extends Clearwell’s capabilities in two directions: upstream, by adding pre-processing; and downstream, by adding document-by-document review and production. I wanted to say a few words about what motivated these changes, and why the new release greatly increases Clearwell’s value to enterprises, government agencies, law firms, and litigation support service providers.

Over the past year, the benefits of early case analysis and first pass review have driven hundreds of companies to adopt Clearwell. They have saved huge amounts of money and time, and often become evangelists for the product. But despite that, we continually hear that the overall e-discovery process remains expensive, unpredictable, and risky. When we investigated why, we found the problem lies less in the features of the products being used than in the number of products used.

Once data is collected, a typical e-discovery process today may involve as many 4 different tools: one for filtering by custodians or date range, another for de-duplication and keyword search, another for load file creation, and yet another for review and production. Each time data moves between these tools, and there’s a handoff from one to another, there’s the risk that document counts do not tie out, data does not convert correctly, or any of a hundred other things go wrong. This risk is magnified by the fact that e-discovery is highly iterative: custodians are often added or keywords changed as new information comes to light, forcing people to redo many steps of the process. As a result, timelines are unpredictable and it’s hard to stick to a budget, even with extensive project management which itself is not cheap.

Since the problem lies in the handoffs between different products, it’s impossible to solve this problem by making any one part of the process better. The only solution is to have a single product that can manage collected data from soup (filtering / pre-processing) to nuts (production). Prior to today’s announcement, that product did not exist: there was no single, integrated product that could do everything from process data to review and produce it. And that, in summary, is why Clearwell is releasing Version 5.0.

With Clearwell’s new product, there are no handoffs, no uncertainty about how long it will take to export out of one tool and into another. There’s no need to cobble together a string of different products or train lawyers on multiple different interfaces and workflows. As a result, the risks of cost overruns or missed deadlines are greatly reduced.

To our mind, this is just part of a natural evolutionary process that affects many markets, not just e-discovery. Who wants to carry a Palm Pilot, iPod, and a mobile phone when you can carry a single device like the iPhone? Who wants a cable receiver and a TiVo when you can get both in a single set-top box?  As markets mature, there develops a logical package of functionality that customers prefer to buy from a single, integrated provider.

You can sign up for a product demonstration at our website, or come see the product at ILTA next week (Booth 606). Take a look – and let us know what you think.

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.