Posts Tagged ‘e-discovery software’

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance

E-Discovery In The Press

Thursday, October 2nd, 2008

Last month, for the first time, friends of mine who do NOT work in the legal industry starting talking to me about e-discovery. In the past, they had always taken on the glazed look of a bored 8th-grader whenever I spoke about what I do. But suddenly, they were strangely interested and full of questions.

The reason was two articles about e-discovery in the mainstream media which appeared within a week of each other. The first was in the Wall Street Journal, which wrote about how tech firms are at war with lawyers. According to the Journal, the fact that companies are saving money by using e-discovery software is bad news for lawyers, since they are “facing the loss of lucrative client fees.” In response, the lawyers are fighting back: “The attorneys counter that there are pitfalls to replacing them. Early this year, a federal judge required chip maker Qualcomm to pay rival Broadcom more than $8 million after it failed to uncover and share emails relevant to a case.”

I am sure there are lawyers who see technology as a threat, but the firms I deal with are actively embracing e-discovery technology, not fighting it. They see it as another way they can add value to their clients, and would prefer to have their staff focused on practicing law, not mindlessly reading irrelevant documents. So I ended up spending a lot of time explaining to my non-legal friends that there are two sides to the coin. As for my friends who do happen to be lawyers, they focused on the Qualcomm case, pointing out (as we have written before) that the problem was not technology, but rather poor processes and bad judgment on the part of the attorneys concerned.

The second article appeared in the Economist and took a different tack. It argued that the stratospheric cost of e-discovery is gumming up the court system and preventing justice from being served. According to one former justice from Colorado quoted in the article, even mundane landlord-tenant disputes “are now digital wars of attrition”; there are “cases that are settled only because one party cannot afford the costs of e-discovery”; and, many “plaintiffs cannot afford to sue at all, for fear of the e-discovery costs.”

I love the Economist’s tongue-in-cheek style and thought the article made many valid points. My one disappointment was that its spin was unequivocally negative, as though e-discovery is a self-inflicted wound on the American judicial system. Nowhere was there mention of the fact that electronic evidence often helps litigants get at the truth. Rather than incomplete recollections or “he said-she said” claims and counter-claims, there’s no disputing an email that captures a person’s words and actions in black-and-white. Nor was there any mention of how technology is solving the problems that it inadvertently created: today, there are many products that rapidly sift through electronic information, dramatically lowering the cost of e-discovery.

It is great for everyone in the e-discovery community for our domain to get more ink in mainstream, quality publications. I expect that the trend will continue as the industry grows, and especially once the investigations start into our current financial meltdown.

Judge Grimm, Victor Stanley, And The Problem Of “Black-Box” E-Discovery Search

Friday, August 22nd, 2008

Judge Paul Grimm’s recent opinion in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008) provides valuable guidance on one of the most important issues in e-discovery: how to conduct keyword searches in a defensible manner given that keyword searches are prone to produce over- and under-inclusive results.  The ruling suggests one of two approaches: either producing parties should adopt a “collaborative” approach to conducting keyword searches, whereby each party agrees on a search methodology; or, they should use a “best practices” approach, such as the one suggested by Sedona, where the producing party tests, samples, and iteratively refines searches so that they can demonstrate they have taken reasonable measures to reduce over- and under-inclusive results.

While the guidance is clear, following the guidance in practice is very difficult.  The primary reason for this is that the search technology being used in e-discovery today is not up to the task.  Specifically, today’s search technology suffers from three problems:

  1. The over- and under-inclusive tradeoff. Many technologies have been developed to address the tendency of keyword searches to miss relevant documents and produce under-inclusive results.  Wildcard and stemming technology has been developed in order to address the issue of finding common word variations in specified keywords.  Concept search has been designed to find documents containing words with similar meanings to the keywords in a search.  And fuzzy search technologies have been put in place to find misspellings of words. However, all of these suffer from the same problem: they produce too many non-relevant or “false positive” documents thus driving up the cost of review. For example, if someone runs the wildcard search “divers*”, then he or she not only gets the desired documents containing “diverse” and “diversity”, but also gets a large number of false positive documents containing “diversion”, “diversification”, and so on.  In the case of concept and fuzzy search, the problem is so great that these technologies to date have rarely been used in e-discovery.
  2. Too expensive to test, sample and refine searches. Today’s search technologies are largely designed to run one search at a time, not the dozens of searches that are typical in e-discovery. As a result, anyone trying to follow the best practices of testing, sampling, and refining each search will find themselves missing deadlines and running over budget because it takes so long. This also makes collaboration with the opposing party close to impossible, since there’s little time to iterate on – and agree upon - a set of keyword searches.
  3. Manual documentation. It’s not enough for producing parties to use best practices, they have to document them so that they can “show their work” to the court. Currently, documenting the search refinement process is mostly manual, with the result that it is either done inadequately or not at all.

The reason why the search technology used for e-discovery has these problems is surprisingly simple: it’s because the technology was not designed for e-discovery in the first place. Rather, it was built for enterprise search, and was only later repurposed towards e-discovery.

The “Black Box” Of Enterprise Search

The core issue is that enterprise search technology has been designed to be a “black box”. Users enter a single search query into one end, and get results at the other, with no visibility into what happens in between. Going back to our previous example, when a user searches for “divers*” intending to find documents related to “diversity” or “diverse”, enterprise search engines give the user no visibility into the crucial step of query expansion and how it expands the search query into relevant and non-relevant terms like “diversion” and “diversification”. As a result, the user has no ability to minimize the false positives.

In the same vein, when a user enters multiple queries into a “black box” enterprise search engine, all of the queries run as a single search, and the user has no visibility into which results are associated with which query. For example, a user that searches for “hiring OR interview” will get the results for the combination of the queries “hiring” and “interview”. He or she won’t know that only 5 of documents contained “hiring” while 100 documents contained “interview.”  This limitation makes analyzing, sampling and refining searches costly and time consuming.

That’s not say that enterprise search products like Autonomy or Endeca are flawed. Far from it.  Their “black box” design works exceedingly well for the simple and quick queries that people want to run across the enterprise for general business purposes. If a sales manager is looking for a single proposal for her meeting the following day, then she doesn’t care how the search was performed or if it’s over-inclusive.  She’s only interested in the first page of relevant results, and for that use case enterprise search engines do a great job.

But e-discovery is a whole different world.  In e-discovery, users typically must review every single document in the search results, not just the most relevant ones.  As a result, over-inclusive searches can dramatically increase the costs of downstream production and review.  And under-inclusive searches raise the issue of defensibility.  Finally, e-discovery users have to run a lot of search queries and understand which documents are associated with each of those queries.

So, going back to the original problem, if current search technologies cannot help lawyers and litigation support professionals follow Judge Grimm’s guidance and address the “well-known limitations” of keyword search, what can? That will be the subject of my next post.

Socha-Gelbmann Survey For 2008 Highlights Shifting Landscape In E-Discovery Software

Thursday, July 24th, 2008

Yesterday, George Socha and Tom Gelbmann published summary results for their 2008 EDD survey. George and Tom gathered self-reported data from 85 e-discovery service providers and 40 e-discovery software companies. To help vendors resist the temptation to “exaggerate” their accomplishments, they then cross-referenced the responses against independent surveys submitted by 29 law firms and 19 corporations, and applied a healthy dose of their own good judgment. The outcome, which they will publish in-full next month, is a great snapshot of the industry, and probably the most objective ranking of e-discovery vendors that you can find.

By comparing this year’s results to the 2007 survey, you get a sense for how much has changed in the e-discovery world over the past 12 months:

Top E-Discovery Software Companies

software.jpg

Note: arrows show change to rankings from last year’s Socha-Gelbmann Survey

Autonomy and Clearwell move up to the Top 5, overtaking Attenex and CT Summation which slip back to the second tier. There are also 3 new names ranked 6 through 10 (Epiq, iConect and Symantec) who displace Cataphora, Doculex, ISYS, and Oracle, none of whom even make it into the top 15. In other words, 70% of the rankings have changed since last year.

If a litigation support manager were to focus only on the Top 5 in making her e-discovery software decision, she would have a choice of some very different solutions. Autonomy positions itself as a high-end (expensive) platform for corporations, while Lexis offers a comprehensive toolset for law firms. Guidance and Clearwell are complementary in that both provide best-of-breed solutions for parts of the EDRM model: Guidance is the leader in collection and preservation, while Clearwell is the leader in processing, analysis and review. Finally, FTI takes a services-based approach which centers around RingTail, its hosted review application.

Looking lower down the list, there were some other interesting results, primarily around which companies were NOT ranked. Kazeon made it into the third tier (ranked 11-15) whereas StoredIQ, its main competitor, did not. Nor did Recommind break into the rankings, despite making a major push into e-discovery from knowledge management over the past year. But the most striking absentees are PSS Systems and Exterro, which have pioneered litigation hold management for Fortune 100 companies. I can only guess that they cover too much of niche market to warrant inclusion in an industry-wide report.

Top E-Discovery Service Providers

In contrast to the world of software, e-discovery services saw much less movement in this year’s rankings:

service-providers.jpg

Note: arrows show change to rankings from last year’s Socha-Gelbmann Survey

There was only one change to the top 5: Fios moved up, displacing Guidance which plummeted 10-20 places down to a 16-25 ranking. In addition, there were two new players in the top 10, Epiq and Huron, who edged out Electronic Evidence Discovery and Ernst & Young.

Conclusion

Changes to the software rankings reflect broader changes in the e-discovery market. As e-discovery has moved in-house, corporations have become a major driver of purchase decisions that were previously left to law firms. Many software companies, such as Attenex, have struggled to make this transition, while others, such as Clearwell, have capitalized on it. There has been no such change in the service provider world and, as a result, the rankings are relatively stable.

It will be interesting to see what happens next year. Every other software space is dominated by a small number of players, like Oracle for databases or VMWare for virtualization. If the same is true for e-discovery, then we can expect many fewer changes to the software rankings in future surveys as the leaders pull away from the pack.

Review-less E-Discovery Review

Monday, July 21st, 2008

terminator.jpgMost science fiction visions of the distant future seem to contain a rather singular fear: that the human race will be taken over by computers.  Think “Terminator” series, preferably without the naked Arnold Schwarzenegger visual.  Regardless of whether this vision fills you with trepidation or excitement there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.

For past several decades, attorneys have had to manually review information for relevancy and privilege in response to the e-discovery process.  Quoting from Information Inflation: Can the Legal System Adapt? by George Paul and Jason Baron, this task has always been viewed as sacrosanct “because of ‘death penalty’ waiver doctrine that evolved long ago when information was still manageable.”

Like so many industries, the legal profession has attempted to grapple with the transformation that the digital revolution has brought to the forefront.  The latest revisions to the Federal Rule of Civil Procedure (FRCP) is the most obvious case in point.  And yet, electronically stored information (ESI) is proving difficult to fit into traditional, even remodeled, paradigms.  Even ignoring (for the moment) the proliferation of novel data types (i.e., blog content, voice over IP or VOIP, webmail, text messaging, web services, etc.) the amount of data that attorneys are being required to review has reached a tipping point of review feasibility.

Back in the day, information was viewed in terms banker boxes of information, and even in the most document intensive discovery matters this measuring stick belied the belief that armies of attorneys could conceivably conquer the massive document review problem.  But now, we often see clients that process routine matters containing terabytes of information.  Most of us in the e-discovery space have become numbed to the abstract nomenclature of megabytes, gigabytes, terabytesi, petabytesii, and in the process we may have failed to realize that we have moved well beyond the scale of information that can be reasonably attacked with even the largest armada of contract attorneys (assuming that the client could conceivably bear the astronomical costs).

“At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later.”iii

I’m certainly not the first to point out that this tipping point is coming, but now we are really starting to see early adopters respond to this sea change. In their linked article above, George Paul and Jason Baron state “It is no exaggeration to say that litigation, as we have known it, is threatened by information’s new hyper-flow. The amount of electronically stored information relevant to a case is already a stress point in litigation.  […]  Litigators can no longer depend on manual review alone….”

Up until now, attorneys and the clients that are footing the bill have had to make a Hobson’s choice:  either “force parties to continue hugely expensive privilege reviews, or to forego the attorney-client privilege or work-product privilege altogether.”   But, now it appears that another way is evolving.

The following lays out a scenario where a non-manual review methodology may make sense.  ***Please note: this approach is not without risk.  At this moment in time neither clawback provisions, the potential adoption of Evidence Rule 502 nor any other know prophylactic measure can completely insulate a producing party from the unforeseen consequences of an inadvertent disclosure.  But, as they say, desperate times call for desperate measures….

Step one: Evaluate the Environment

The following factors represent some of the elements that should be taken into consideration prior to skipping the normal, human based review steps that are seen in most e-discovery matters.

  1. Large data set.  This may sound a bit obvious, but a non-manual approach is best suited for large, unwieldy data sets.  The corpus doesn’t need to be in the terabytes, but the data set should be evaluated in term of discovery processing costs and attorney review estimates.
  2. Short Production Timelines.  Once the above calculations are conducted, the next step is to determine if a human based review could even conceivably be conducted in the given time frame.  In many instances, an eyes-on review process just won’t be feasible since there won’t be enough bodies to throw at the problem.
  3. Next Gen “PAR” Tools.  In order to pull this “review-less” review process off, both safely and quickly, the responding party needs to have access to fast, robust processing, analysis and review (“PAR”) tools.  Certainly, it’s possible to have this scenario work with an e-discovery service provider, if they have the capability.
  4. Relatively Small Amount in Controversy.  For the time being, this approach should not be considered for any “bet the company” litigation, nor anything with significant downside risk (governmental inquiries, punitive damages, class actions, 2nd requests, etc.).  Yet, for many standard commercial lawsuits, corporate investigations, HR claims, etc. this review-less approach may be worth considering.
  5. Ability to Use a Clawback Provision.  Entering into a clawback provision with the opposition is mandatory in this methodology since the chances of an inadvertent production are statistically ever-present.  Yet, until Evidence Rule 502 is resolved, there will always be a risk that the clawback won’t be enforceable against 3rd parties.
  6. Non-governmental Production.  Most information in governmental productions becomes part of the public record, meaning that a clawback isn’t going to be feasible.  Here, trade secret information, personally identifiably data and the like would be disastrous if pushed out into the public domain.

Step two: Perform a Risk/Benefit Analysis

Next, take all the above factors into consideration and determine if the risks (of inadvertent production, the clawback being ineffective, etc.) are worth the benefits (reduced costs, lower attorney review fees, ability to meet deadlines, etc.).

Sure this is hard work, but the alternative (manual review) is more ephemeral than realistic.

[In my next post, I’ll address the tactical steps to conduct a review-less review process.  Stay tuned……]

i One terabyte is generally estimated to contain 75 million pages and could conceivably cost $18,750,000 to review.  Anne Kershaw, Automated Document Review Proves Its Reliability, 5 DIGITAL DISCOVERY & E-EVIDENCE 11 (2005).

ii According to Wired, we’re now in the “Petabyte Age” where that amount of data is processed by Google’s servers every 72 minutes.

iii Wired article, above.