E-Discovery Processing: You Get What You Pay For
by Kurt Leafstrand on May 6th, 2008
Anyone reading today’s announcement from Kazeon could be forgiven for doing a double-take: did someone misplace the decimal point? Kazeon claims that it can perform “processing of ESI in preparation for eDiscovery matters as low as $4.30 per Gigabyte.” Assuming that’s not simply a typo, it begs an obvious question: If Kazeon really can process information at a tiny fraction of what e-discovery service providers are charging, how come every e-discovery service provider isn’t going out of business? Why wouldn’t everyone take this incredibly good deal?
The answer (in press releases, as in politics) lies in definitions. Exactly what sort of processing would you be getting for your four dollars and change?
You’ll have to ask Kazeon to get the answer to that one, but give a venti latte to a bleary-eyed e-discovery service provider who’s just pulled an all-nighter preparing for a meet-and-confer, and they’ll tell you all about the nuances, complexities, and risks inherent in e-discovery processing that may be difficult for enterprise search/information lifecycle management vendors to grasp. Quite likely, they will refer you to EDRM’s processing node overview, which outlines the basic goals of robust processing:
- Capture and preserve the body of electronic documents;
- Associate document collections with particular users (custodians);
- Capture and preserving the metadata associated with the electronic files within the collections;
- Establish the parent-child relationship between the various source data files;
- Automate the identification and elimination of redundant, duplicate data with the given dataset;
- Provide a means to programmatically suppress material that is not relevant to the review based on criteria such as keywords, date ranges or other available metadata;
- Unprotect and reveal information within files; and
- Accomplish all of these goals in a manner that is both defensible with respect to clients’ legal obligations and appropriately cost-effective and expedient in the context of the matter.
And that’s just the high-level overview. After the caffeine from the latte starts to kick in, they’ll tell you it’s also absolutely critical to:
- Provide statistical count tie-outs that reconcile every incoming email, loose file, and attachment with the processed document set
- Automatically scan critical large container files (such as PSTs) for errors and problems prior to processing
- Automatically perform custodian mapping to track ownership of all documents
- Maintain detailed reports on every anomaly encountered during processing, down to the individual email, loose file, and attachment
- Automatically handle common metadata anomalies (with logging) so that the maximum number of documents are made available for review
- Provide robust and thorough handling for container files regardless of container format
- Support non-email content types such as contacts, calendar entries, tasks, and notes
- Robustly handle embedded objects
- Provide full visibility into exceptions encountered during processing, along with an integrated exception handling process to allow repaired/decrypted data to be easily added back into the document set
All that for under five bucks? That’s quite a deal! But remember, if you drive by your corner gas station tomorrow morning and they’re advertising regular unleaded for 20 cents a gallon: It may be cheap, but it’s probably not gas you’re getting.





