Archive for May, 2008

E-Discovery Search: EDRM’s Next Frontier

Friday, May 23rd, 2008

miss.jpgEarlier this month, I attended the 4th annual EDRM kickoff meeting in Saint Paul. For those unfamiliar with it, the E-Discovery Reference Model (EDRM) project is one of the foundational organizations in electronic discovery, started in 2005 by George Socha and Tom Gelbmann. It has grown rapidly over the past 3 years and now boasts a roster of 80+ companies that represent the best and brightest in the e-discovery world, both vendors and (increasingly) practitioners.

What I love about EDRM, and what differentiates it so much from other (equally worthy) groups that focus on issues around electronic discovery, is its relentless focus on solving practical problems, usually in a very concrete and actionable way. There’s clearly a place for deep thinking about e-discovery case law and strategizing about best practices for e-discovery policies and practice, but Tom and George saw a growing need for a complementary group that focused on e-discovery nuts-and-bolts, with a particular emphasis on technology-based solutions. If nothing else, EDRM’s rapid growth has certainly proven the merit of their idea.

So, like the mighty Mississippi River that flows by a few hundred yards from the doors of the St. Paul Hotel, when you dive into EDRM, you know it’s going to take you somewhere. There is exciting follow-on work that is happening across a range of current projects, most notably Evergreen, XML, Metrics, and Code of Conduct. But what I found most exciting this year is a completely new project that will be tackling one of the most important and challenging aspects of e-discovery: search.

Launched by a grassroots group of conference participants, the Search Project seeks to tackle critical problems around e-discovery search from a practical, actionable perspective. Given the criticality of search in e-discovery, it is something that many participants felt was sorely needed.

For example, the project is discussing how to develop a common language for talking about search across various vendor implementations. It’s also tackling the problem of ensuring consistency across searches in different steps of the EDRM process. This whole area is ripe for innovation, and I am delighted to see the e-discovery community collaborating to make more progress than any one participant could alone.

Google Moves E-Discovery To The Cloud

Monday, May 19th, 2008

g-discovery2.jpgThere is no bigger idea in enterprise technology than the idea of “cloud computing“. What does it mean? Simply put, the idea is that enterprises will cease to buy hardware, software, and all the headaches that come with them. Instead, companies will rent whatever applications they need and access them over the internet. Software vendors will keep their applications on a pool of shared infrastructure (the “cloud”), which will automatically allocate resources between applications according to demand. Using a common analogy, we will move from today’s world where companies are buying and building their own electricity generators, to a world where there are power companies distributing electricity over a grid.

To get a sense for how this might happen, just take a look at the CRM market. Ten years ago, Siebel and other packaged software vendors were among the fastest growing companies in America. Today, they are shrinking as customers migrate en masse to, for example, salesforce.com’s cloud-based approach. One Wall Street analyst I spoke to last week forecast that hosted (i.e., cloud-based) applications will grow their market share from 12% to 21% by 2011, and account for all growth in the market.

E-discovery is no exception to this mega-trend, and I expect a portion of the e-discovery software business to move to the cloud. How quickly this happens depends on how easy it is for companies to adopt cloud-based e-discovery solutions, which is why Google’s recent moves into e-discovery are so significant.

Google is by far the largest cloud computing company in the world. Its cloud-based Google Apps suite of applications was only launched in 2007, but is already being used by several hundred thousand businesses and, Google tells me, 2,000 new businesses sign up every day. Today, the customers are mainly small to medium sized businesses (500-5,000 employees). But as its functionality improves, larger companies will increasingly start asking why they should pay for Microsoft Office when cheaper alternatives exist.

Talking to Bill Kee, a product marketing manager at Google, it’s clear the biggest gap in Google Apps’ functionality was the lack of enterprise features around security, compliance, and e-discovery. That’s why Google acquired Postini, a leader in messaging security. It’s why Google recently launched Message Discovery, a hosted archive that comes bundled into Google Apps Premier Edition. And it’s why Google is collaborating with Clearwell to educate the market on cloud-based e-discovery solutions.

If you are interested in learning more about “e-discovery in the cloud”, register for a free webinar which we are hosting with Google on June 3.

Court Asks Meet & Confer Participants Re. E-Discovery: Is That Your Final Answer?

Thursday, May 15th, 2008

regis2.jpgSimilar to the TV show, “Who Wants To Be A Millionaire” this blog entry features the “lifeline” option. Sure, I could’ve gone solo on the post, but when my friend and legal discovery guru David Isom chimed in with his usually insightful analysis, I couldn’t resist the temptation to add his comments about e-discovery.

Ok, so on to the discussion… There’s been no shortage of fodder about the impact of early meet and confer conferences on the e-discovery process. In Mikron (Mikron Ind., Inc. v. Hurd Windows & Doors, Inc., 2008 WL 1805727 (W.D. Wash. Apr. 21, 2008)) the court roundly rejected the Defendant’s petition for a protective order when they asked the Court to shift the costs of electronic discovery to the plaintiff because “searching through their electronically stored information (ESI) would generate substantial costs and yield cumulative results.”

The court used a two-part approach to deny the petition. First, the court held that “defendants failed to discharge their meet and confer obligations in good faith, as required by Fed.R.Civ.P. 26(c).” Elaborating on the purpose behind the Rule:

“[C]ompliance with the Rule would have involved a more substantive discussion regarding defendants’ difficulty in producing responsive ESI, the extent to which defendants have searched ESI to date, and the foundation for defendants’ belief that a more thorough search of ESI, including backup tapes, would yield only information that has already been produced. Plaintiff’s counsel stated that no meaningful discussion of these issues took place before the motion was filed, and defendants have submitted no evidence to dispute this fact. Instead, plaintiff’s counsel received no response when it identified specific “gaps” in production and reasonably asked defendants to articulate the foundation for their assertion that unsearched ESI would produce “little additional responsive information.” … A conversation with opposing counsel does not become a “meet and confer” conference simply because a party has attached that label to the discussion.”

As if that trouncing wasn’t enough, the court went on to conclude that even if the defendant had overcome its meet and confer burden, it would have failed on the merits because they didn’t sufficiently address the “inaccessibility” provisions of (Rule 26(b)(2)(B)).

“In alleging that continued discovery of their ESI would be unduly burdensome, defendants offer little evidence beyond a cost estimate and conclusory characterizations of their ESI as “inaccessible.” Defendants have not provided the Court with details regarding, for example: (1) the number of back-up tapes to be searched; (2) the different methods defendants use to store electronic information; (3) defendants’ electronic document retention policies prior to retaining an outside consultant; (4) the extent to which the electronic information stored on back-up tapes overlaps with electronic information stored in more accessible formats; or (5) the extent to which the defendants have searched ESI that remains accessible. Beyond the estimated costs, defendants have not demonstrated an unusual hardship beyond that which ordinarily accompanies the discovery process.”

Now, on to the lifeline…as usual, David’s insight into this ruling was right on, which is not surprising since he’s the co-chair of Greenberg Traurig’s national e-Discovery & e-Retention Practice Group. His response (with a bit of editorializing) was:

Even this short opinion denying a motion for protective order has important implications for e-discovery. Mikron is significant for the following reasons:

  1. It demonstrates the level of facts needed to meet the burden to get a protective order limiting the discovery of ESI
  2. It shows the diligence needed to make a litigation hold worth the paper that it is not written on
  3. It’s a useful explication of the burden-shifting of the new inaccessibility provision - Rule 26(b)(2)(B)
  4. It limits what might have been seen as free-ranging cost-shifting (ala Zubulake and Rowe) to rules-based factors such as in Rule 26(b)
  5. Finally, it’s a good discussion of the types of information that will be needed to meet the initial burden of establishing inaccessibility

In sum, David points out that a big gap exists between the mere comprehension of both the new meet and confer provisions and the “inaccessibility” section. Like so many areas of the law, just knowing the rules doesn’t mean you have the facts to support your position. So, work with your internal IT folks, vendors, partners, etc. to get a “lifeline” of your own. Failure to do so will probably leave you looking for answers.

For those interested in really being able to comply with these new standards, please see another collaboration with David, which delves into the nuances of Rule 26(b)(2)(B).

E-Discovery Processing: You Get What You Pay For

Tuesday, May 6th, 2008

gas-prices.jpgAnyone reading today’s announcement from Kazeon could be forgiven for doing a double-take: did someone misplace the decimal point? Kazeon claims that it can perform “processing of ESI in preparation for eDiscovery matters as low as $4.30 per Gigabyte.” Assuming that’s not simply a typo, it begs an obvious question: If Kazeon really can process information at a tiny fraction of what e-discovery service providers are charging, how come every e-discovery service provider isn’t going out of business? Why wouldn’t everyone take this incredibly good deal?

The answer (in press releases, as in politics) lies in definitions. Exactly what sort of processing would you be getting for your four dollars and change?

You’ll have to ask Kazeon to get the answer to that one, but give a venti latte to a bleary-eyed e-discovery service provider who’s just pulled an all-nighter preparing for a meet-and-confer, and they’ll tell you all about the nuances, complexities, and risks inherent in e-discovery processing that may be difficult for enterprise search/information lifecycle management vendors to grasp. Quite likely, they will refer you to EDRM’s processing node overview, which outlines the basic goals of robust processing:

  • Capture and preserve the body of electronic documents;
  • Associate document collections with particular users (custodians);
  • Capture and preserving the metadata associated with the electronic files within the collections;
  • Establish the parent-child relationship between the various source data files;
  • Automate the identification and elimination of redundant, duplicate data with the given dataset;
  • Provide a means to programmatically suppress material that is not relevant to the review based on criteria such as keywords, date ranges or other available metadata;
  • Unprotect and reveal information within files; and
  • Accomplish all of these goals in a manner that is both defensible with respect to clients’ legal obligations and appropriately cost-effective and expedient in the context of the matter.

And that’s just the high-level overview. After the caffeine from the latte starts to kick in, they’ll tell you it’s also absolutely critical to:

  • Provide statistical count tie-outs that reconcile every incoming email, loose file, and attachment with the processed document set
  • Automatically scan critical large container files (such as PSTs) for errors and problems prior to processing
  • Automatically perform custodian mapping to track ownership of all documents
  • Maintain detailed reports on every anomaly encountered during processing, down to the individual email, loose file, and attachment
  • Automatically handle common metadata anomalies (with logging) so that the maximum number of documents are made available for review
  • Provide robust and thorough handling for container files regardless of container format
  • Support non-email content types such as contacts, calendar entries, tasks, and notes
  • Robustly handle embedded objects
  • Provide full visibility into exceptions encountered during processing, along with an integrated exception handling process to allow repaired/decrypted data to be easily added back into the document set

All that for under five bucks? That’s quite a deal! But remember, if you drive by your corner gas station tomorrow morning and they’re advertising regular unleaded for 20 cents a gallon: It may be cheap, but it’s probably not gas you’re getting.

What’s Different About E-Discovery Search?

Monday, May 5th, 2008

raiders-warehouse.jpgIn his latest article, Craig Ball argues that lawyers “need to learn more about the science of search.” Craig says that at least part of the reason for this is that searching in e-discovery is challenging and different from the searching to which lawyers are accustomed.

“Lawyers believe themselves adept at keyword search in e-discovery because they’ve mastered keyword search in online legal research. The correlation is superficial at best. Unlike the crazy quilt of ESI, the language of reported cases is precise, consistent and structured. Misspellings are rare. Legal research is Disneyland. E-discovery is Baghdad.”

I had a conversation on a similar topic with Ron Friedman last month after my last post where he made a similar argument about lawyers needing to learn e-discovery search tools.1

I think Craig and Ron make excellent points. E-Discovery search is different and it’s important for lawyers, investigators, litigation support professionals and other practitioners to understand how. The natural questions that arise from their arguments are: what is different about e-discovery search? How is it different from other familiar searches, such web search and legal research search? The answers are important because it can help guide e-discovery experts on how to train lawyers and even guide attorneys during review. It is also important for developing e-discovery best practices and e-discovery search software.

I think the first step in answering these questions is to agree on the definition of e-discovery search, or better said the types of e-discovery search since there are several. To address this appropriately would take a least another full post or a paper. As a result, I will leave the detailed discussion of these matters to another time, but for this discussion I will focus on searches used to identify potentially relevant documents for purposes of matter assessment (i.e., understanding the nature of the case: who did what, where, when and why) and for document production to the opposing party.

I have observed five major characteristics of e-discovery search that as a whole differentiate it from other searches. I would be interested to hear additional views on what is different about e-discovery search, so please comment on this post.

Recall
First, the cost of missing a relevant document, or low recall, can be very high in e-discovery. Missing a document that you should have produced could result in sanctions and adversely impact the case outcome. Missing key documents could also affect your legal strategy causing you to make sub-optimal decisions. Missing relevant documents can be costly in other searches as well. For example, in legal research, not identifying case law that is critical to your case could also have a detrimental impact on your legal strategy. However, low recall is on average costlier and more likely in e-discovery. In contrast to e-discovery and legal searchers, web search users are typically not very concerned with missing relevant documents. For the most part, they are interested in the most relevant documents, not all of the relevant documents. This is why Google rarely actually provides all the results for a search (you can try this yourself by paging to the end).

Precision
Second, the cost of returning false positives, otherwise known as low precision, in e-discovery searches is high. The results of e-discovery searches including false positives are typically produced and reviewed by humans at costs as high as several dollars per document. On the other hand, false positives have a minimal cost in web search because users either won’t see them if they are ranked low or will ignore them after minimal review. False positives can be costly during legal research in certain scenarios, such as when the stakes and nature of case are such that many search results need to be exhaustively reviewed, but typically the costs are lower.

Varied Language
Third, documents searched during e-discovery often include personal emails and files and frequently use varied language including jargon, slang, abbreviations, technical terminology, misspellings, and machine-created junk. This is Craig’s “Baghdad” point. In contrast, as Craig points out, documents searched during legal research, such as opinions, motions, etc. are typically well-structured documents with no misspellings, relatively consistent language etc. Even web sites are generally “cleaner” than typical e-discovery documents.

Complexity
Fourth, users are often looking for different information when performing searches during discovery. E-Discovery searches are often aimed at comprehensively understanding “who did what, when, where and why” in a matter where the people involved may be trying to hide this information and where there may be no single “starting point”. As a result, e-discovery searchers often adopt strategies that involve large numbers of queries, and will follow the evidence and iteratively refine their searches for combinations of topics, people, places, etc. Legal searches can also be fairly complex, but as with other differences this is one of degree. These searches typically don’t involve hundreds of queries and terms, are often more narrowly defined and have a “starting point”. Web searches tend to be even simpler. Most are one or two words.

Transparency
Finally, e-discovery search is part of a legal process. The searches themselves are subject to negotiation with and review by opposing counsel and the court. This process can also take place over long time frames. As such, there is a great need for transparency in the development and execution of e-discovery searches. It is also important for e-discovery searchers to develop a defensible audit trail to prove what searches were run and what results were produced when. This is not the case in web or legal research.

These differences have a number of implications for e-discovery search best practices, training, software and more. I will discuss these in more detail in future posts. However, I think these differences make clear why Craig and Ron are right to suggest that people who are new to e-discovery can benefit from specialized training and tools. Similarly for those of us who are deeply involved in e-discovery, I believe these differences point to the fact that there is still a lot of work to be done in developing best practices and software to make it easier for lawyers and other users to perform e-discovery searches effectively.

1 Ron also wrote another interesting post on this topic which can be found at PrismLegal.com.