Archive for June, 2009

EDRM Continues Drive to Solve Practical Electronic Discovery Problems

Tuesday, June 23rd, 2009

As most electronic discovery veterans are aware, the EDRM Project is an effort founded five years ago by George Socha and Tom Gelbmann to bring together a community of e-discovery practitioners for the purpose of solving some of the industry’s most challenging problems.

It may be hard to believe, but there was time in the very recent past where the iconic EDRM model did not yet exist. No multicolored boxes, no arrows, no sloping volume and relevance lines — nothing. Coming up with a standard way of talking about electronic discovery was the first problem that the group set about solving, and I think it would be hard to argue with the fact that they came up with the gold standard: a simple, clear, concise model that, at least so far, is standing the test of time as a way of thinking about the flow of the e-discovery process.

With each passing year, the group has started to address a broader set of problems, all with a practical bent.  Currently, there are eight:

Project Goal
Evergreen Keep the EDRM model fresh and relevant as the industry grows and evolves
XML Provide a standard, generally-accepted XML schema to facilitate the movement of electronically stored information from one step of the e-discovery process to the next
Metrics Provide an effective means of measuring the time, money, and volumes associated with e-discovery activities
Code of Conduct Develop aspirational voluntary ethical guidelines for e-discovery providers and consumers
Search Provide a framework for defining and managing the various aspects of search as it applies to the e-discovery workflow
Data Set Compile a 100 gigabyte public data set that can be used to test various aspects of e-discovery software and services
Jobs Provide a professional resource for the e-discovery community and  communicate about e-discovery related jobs
Information Management Explore the emerging need for e-discovery standards in information management (the “upstream” part of the process)

This year’s annual EDRM conference took place back in May. After years of meeting in the same chilly and wind-swept location in downtown St. Paul, Minnesota, George and Tom had the brilliant idea of spicing up the meeting a bit by moving it to a more exotic locale: Bora Bora! Plans were set in motion, but quickly the overwhelming feedback came back from EDRM members: E-discovery is so fascinating, so heart-warming, that adding Bora Bora to the mix would simply be too much for the vast majority of the participants to bear. So St. Paul it was!

This was Clearwell’s third EDRM conference, and location aside, it’s been fascinating to see how it has changed over the last few years. Here are several notable trends from this year’s kickoff:

  • More participation from end-users: There was a definite increase in the number of end-user/consumer participants (that is, those not from the vendor community), particularly from law firms. This could be taken as further evidence that e-discovery is indeed moving in-house.
  • Increased enthusiasm to take on new challenges: One of the great things about EDRM is its willingness to try to tackle new areas that aren’t being directly addressed by some of the other (fantastic) organizations out there like Sedona. This was in evidence several years ago, when Clearwell was fortunate to get involved in the early stages of the EDRM XML project, which has proven to be a huge time, cost, and risk reducer for many in the industry by providing a common standard that can be used to move data within the e-discovery process. It was in evidence last year when Clearwell’s CTO was able to help launch a new effort around Search that is seeking to develop standards and best practices in an increasingly complex and contentious area. And, finally, it was in evidence this year with the launch of the Information Management project, a cutting-edge group that is exploring how to solve the challenges that e-discovery poses for information management – certainly a complex area in need of thought leadership.
  • Improved collaboration: One thing that has amazed us from day one is how collaborative EDRM is, and continues to become. There are a lot of e-discovery vendors involved who, outside of the confines of the St. Paul Hotel, aggressively compete in the marketplace. However, George and Tom have been able to create an environment at EDRM where competitive spirits are set aside and ideas can be cultivated which provide huge value across the e-discovery landscape (both vendor and consumer).

One final note: If you’re an e-discovery practitioner in a law firm or corporate setting, I’d encourage you to get connected, either informally (through the EDRM web site) or formally (by signing up for one or more of the projects). While end-user involvement continues to grow, there is definitely still a need for more non-vendor involvement. It is critical in ensuring real and relevant problems get solved, and to pushing the state of the art in e-discovery forward. Please join us!

How To Reduce Electronic Discovery Costs

Monday, June 22nd, 2009

In the post, E-Discovery 911: Reducing E-Discovery Costs in a Recession, we analyzed the question: which electronic discovery activities are the most costly today and thus have the greatest room for cost reductions? An analysis of a typical, hypothetical case demonstrated that the bulk of e-discovery costs reside in the processing and review stages. In this post, we want to look at the different ways of reducing e-discovery costs and which are likely to be the most effective, especially given processing and review costs are the largest sources of expense.

Corporations have the following options for reducing e-discovery costs. Some of these approaches are aimed at changing the overall way e-discovery is performed. And some of these are aimed at improving the results of a particular step within a typical e-discovery process. None of the options are mutually exclusive.

  • Retain less data through information management: one of the methods that corporations can undertake to reduce e-discovery costs even before e-discovery has begun is to adopt a data or document retention policy. Such a policy can, for example, stipulate that the corporation deletes all documents not required for specific business, legal or compliance reasons after a fixed period of time, such as 90 days. As a result, a properly implemented document retention policy has the potential to significantly reduce the amount of data that is identified and collected during electronic discovery.
  • Better assess your case and your discovery issues: another approach to reducing the overall costs of litigation including discovery is to perform an early case assessment. Pioneered by Dupont and others, the objective of this approach is to understand all the key case facts within a short period of time so that the litigation team can make better decisions quicker. Because costs always rise over time, quicker resolution of litigation reduces costs. While early case assessment was originally an overall approach to litigation, there is now an equivalent in electronic discovery. The goal is to identify all the potentially discoverable data, but only collect, process, and analyze a prioritized portion of this data in order to inform an understanding of the case AND calculate an estimate of the ultimate potential e-discovery costs.
  • Bring e-discovery in-house: another holistic method for reducing electronic discovery costs is to manage all or a portion of the e-discovery process in some or all matters inside the Enterprise as opposed to outsourcing it to law firms or litigation service providers. While bringing e-discovery in-house has other benefits, such as improved security and control, the principal benefit is to convert variable service costs, typically priced on a per Gigabyte basis, into fixed software costs thus producing a return on the investment to manage e-discovery in-house.
  • Preserve and collect less data: in addition to holistic approaches, e-discovery costs can be reduced at each step in the e-discovery process. One way to reduce e-discovery costs would be to preserve and collect less data. Reducing the amount of preserved and collected data not only reduces the cost of each of these steps but also reduces the cost of each downstream step. There are pros and cons to this approach which I will discuss in a later post.
  • Process less data: more data is frequently preserved and collected than needs to be processed for analysis and review. This excess data can be filtered out prior to processing thus reducing processing and all other downstream costs. The techniques used to do this are often referred to as pre-filtering, pre-processing or early data analysis.
  • Process differently and review native: historically, most electronic data was converted to an image format, such as TIFF, prior to review. This process is computationally intensive and expensive. In recent years, e-discovery practitioners have been processing and reviewing more documents in a native or near-native format and avoiding the cost of converting documents to an image format until later in the process.
  • Review less data: data can also be reduced after processing and prior to review and production. Much has been written in the e-discovery community about this process, often called “cull-down,” and the different search and analysis techniques that can be used as part of this process, such as keyword search, concept search, de-duplication, and others. The fewer documents requiring processing and review, which as we have seen is a substantial portion of the overall costs, the lower the overall costs.
  • Review data faster: in addition to reducing less data, the electronic discovery community has pioneered new methods of reviewing data faster including data clustering, near de-duplication, and other more automated review techniques. The faster documents are reviewed, the lower the attorney review costs.

While all of these approaches have the potential to reduce the costs of electronic discovery, some are going to be more effective than others. Each approach can be implemented using a multitude of techniques or practices and each of these techniques has their pros and cons. For example, some techniques may have a greater risk of raising defensibility issues from the court or opposing side than others. Other practices may be less expensive initially, but, over the course of a changing and iterative e-discovery, may prove to be more costly overall. In a series of future posts, we’ll review the different practices used as part of these approaches and analyze the pros and cons of each to understand which may be the most effective for your organization.

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.

Social Media: Electronic Discovery’s New New Thing?

Monday, June 1st, 2009

Lately, the electronic discovery blogosphere has been, well, a-twitter about twitter and other social media as they relate to electronic discovery. While twitter struggles to find a business model, enterprises and law firms are racing to understand the implications of this latest boomtown of user-generated content that’s being built in out on the frontier of the World Wide Web (or is that Wild Wild West?).

There’s talk of intellectual property being cast out, irrevocably, onto the Internet for all to see. Or slanderous things being uttered for which your company may be held liable. But, hold on a second: is there really anything new here? Anyone heard of e-mail? Web pages? Peer-to-peer? Google? Instant messaging? As Debra Logan astutely points out in her recent post on the topic, “everything that exists is discoverable (at least pretty much).” If you haven’t already, take a look at the FRCP’s definition of ESI and you’ll get her point. So, yes, it’s obviously important to have a common sense corporate policy around what’s appropriate and what’s not for the public Internet, but it shouldn’t be any different from the policy that you should have already had in place regarding blogs, web pages, and email.

What about the other side of the electronic discovery coin: finding information that’s responsive to a request? If anything, social media are more easily discoverable than just about any other form of user-generated content (though admittedly in some cases they can be more transient, which can post unique challenges). And, while it’s not universally true, the argument can be made that the more easily something can be discovered, the lower the cost and risk of that content to you. Worried if anyone on twitter is stealing your new idea for a router architecture? How about the top-secret approach to making coffee you were thinking about patenting? Well, if anyone twittered about it, tracking it down is a snap. Just keep in mind that because of the public nature of social media, it’s likely that the more important the information is to your company in the context of electronic discovery, the less likely it is to live out on the public Internet. Obviously, there will be exceptions. But when there are those exceptions, tracking down the relevant information will likely be a fairly straightforward and relatively inexpensive process.

However, before we dismiss social media as nothing new and something that can largely be addressed through already-existing policies and discovery techniques, let’s consider one aspect of social media that is on the upswing, but often out of the blogging limelight: enterprise applications.

Increasingly, companies are moving to advanced enterprise social media platforms such as Jive or SocialText as a way of improving internal collaboration and making projects run more smoothly and effectively. Because such enterprise platforms are often used on a company’s most important and strategic projects, having robust e-discovery capabilities to allow internal blog, wiki, and discussion content to be captured and placed into a format that can be seamlessly searched along with other more traditional documents is becoming critical to forward-thinking enterprises.

For example, I recently came across a large financial institution that uses Jive SBS as its wiki and Clearwell as its e-discovery solution. What surprised me is that this company has created its own Jive/Clearwell “adapter” that feeds Jive discussions directly into Clearwell as a conversation thread. This is just one example, but I’m sure more will follow. Over time, it will become a requirement for e-discovery platforms to integrate with enterprise social media products. And, rest assured, as that happens, we’ll be sure to tweet about it!

UPDATE: Whit Andrews of Gartner was kind enough point out his (prescient) research note on the subject of e-discovery and social networking from November, 2007. He points out that there is in fact a very important “new new thing” about social networks, which is that they may be able to be leveraged in an e-discovery context to find out more about the people relevant to an investigation. By tapping these publically-available sources of information, investigators may be able to gain better insight into private (i.e. enterprise) information stores to guide the e-discovery process. More detail on this and other insights can be found at http://www.gartner.com/DisplayDocument?id=543110&ref=g_forward&call=email.