Archive for the ‘review’ Category

Top Ten Trends in Electronic Discovery

Wednesday, November 11th, 2009

Since I’ve finished off the last of the Halloween candy and tossed out the moldy, squirrel ravaged pumpkins, it occurred to me that now might be a good time to think about what 2010 will hold for the electronic discovery industry.  My 2009 list seems to have been fairly prescient and many of those notions still hold true since the legal industry (as we know) doesn’t move at the most blistering pace.

Again, doing my best Nostradamus impersonation, here are my top ten trends for 2010:

  1. Early case assessment (ECA) moves from a “nice to have” to a “must have” requirement for any matter involving electronically stored information (ESI).  In 2009, we saw ECA move into the mainstream as a methodology to quickly understand case facts, assess risk and lower both review and data processing costs.  But, in 2010, with the advancement of the tools and the increased socialization within the bar and the litigation support community, ECA will graduate into a core methodology for savvy litigators regardless of matter type or size.
  2. Appetites for broad information lifecycle management initiatives diminish as organizations realize these programs are far too complex to solve specific pain points, and they often take too much time (measured in years) to execute.  The economic reality is that these holistic, cross data, cross enterprise pipe dreams really can’t demonstrate the ROI that’s needed in today’s challenging economy.
  3. Staffing roles continue to evolve with a newfound focus on project management. The role of an in-house e-discovery coordinator will emerge as more of a project management and analyst versus pure legal or IT. This shift will become increasingly necessary as e-discovery evolves from an ad-hoc fire drill to a standard business process that is repeatable, measurable, and defensible.
  4. Data analytics and statistical methodologies gain traction to augment the type of subjective decision making approaches that have historically formed the backbone of the e-discovery search and review processes.  These objective methodologies have long been called on as best practices by the likes of the Sedona Working Group. In 2010, they now will start to move from theoretical to practical task as e-discovery tools increasingly move in-house and departments enhance defensibility and add elements such as sampling into the workflow.
  5. Platform e-discovery solutions finally become a reality as customers finally graduate from painfully stitching point solutions together, thus requiring less physical document hand-offs (i.e., exports and imports) between applications, cutting costs and lowering the risk of data loss.
  6. Associate-based review gradually goes extinct, as both clients and law firms tire of expensive, linear review processes.  More review work becomes either insourced or is managed with specialized contract attorneys, who are both cheaper and better trained for this type of work.
  7. Similarly, FRE 502 and “clawback” agreements will be increasingly used to reduce the need for any manual, eyes-on review, although many litigators will resist this trend because of the fears of “un-ringing the bell” when privileged information is disclosed in any context.
  8. While perhaps anathema, alternatives to the much lauded EDRM model will gain traction, as practitioners strive to find an even better, and perhaps more practical, project management framework, in many cases acknowledging the role that the EDRM has taken in forming *the* lingua franca of the e-discovery industry.
  9. The push for cooperation in the e-discovery process, will make incremental progress despite reticence by old school litigators.  Increasingly, this type of cooperation, as strongly advocated by the Sedona Working Group, will be ironically forced by judges and local rules.
  10. “Cloud” computing starts to really impact how e-discovery data preservation/collection is done, both in terms of social media and traditional ESI.  More and more companies block social media applications and file types in the workplace because of fears surrounding the inability to preserve and collect.

E-Discovery MythBusters: Debunking Common Myths About ECA

Tuesday, August 25th, 2009

We’ve devoted a number of posts to the topic of ECA, ranging from a quest to define the acronym, all the way to the cost savings benefits of the ECA approach.  And, while there seems to be relative unanimity around the beneficial aspects of ECA, there still seem to be a number of myths and misconceptions.  So, ala the Mythbusters, we’ll run these myths through the gauntlet to see which survive scrutiny.

Myth #1: ECA Is Only Valuable if Performed “Early”

Certainly, ECA is best leveraged and will be most valuable when performed at the outset of litigation.  As has been stated before, it has value on two primary fronts, the first being the ability to scope electronic discovery (both in terms of cost and timelines).  The next is the more traditional value proposition where ECA is used to get an understanding of the case facts to enable the strategic decision making process.

As such, there are scenarios where an ECA methodology would still generate value even if performed “later” in the mater.  For instance, with bifurcated, class action litigation initial discovery about the class may occur months before discovery on the merits.  In this instance using a later ECA approach would still make sense since discovery about the case facts may not have been possible earlier on.  Similarly, “late” ECA may still hold value when new parties or claims are added to an existing lawsuit, or when there’s a substantial change in case direction, data, or custodians.

Myth #2: ECA Is Only Performed With Technology

Sure, enterprise grade ECA products  are an important part of the mix, but the products won’t perform an ECA by themselves.  There’s just too much subjective decision making involved in the assessment process.   Therefore, the right people are critically important — not only in terms of experience performing this analytical work, but also in their ability to capably testify about the underlying decision making process.  It’s also important to be able to follow a repeatable and defensible processes to show that the “recipe” used was aligned with industry best practices and wasn’t ginned up for a particular engagement.

Myth #3: ECA Only Works With Large ESI Volumes

Yes, ECA methodologies makes a lot of sense for large, bet-the-company matters because even modest savings when processing, analyzing and reviewing terabytes will easily approach six to seven figures.  However, smaller matters will still benefit from better budgetary insights that facilitate informed matter management.  And, in a way there’s almost more benefit from being able to quickly evaluate (fight/settle) smaller suits since the transactional costs are so high relative to the amount in controversy.  In both scenarios it’s important to view objective case data to prepare for meet & confer conferences.

Myth #4: Clients Don’t Want To Pay for ECAs

Many end clients (corporate counsel typically) have a similar litigation mindset:  i.e., the desire to avoid costs for as long as possible.  While avoiding early costs makes some sense on its face, the fact is that spending a small amount of money early on (for budgetary and case assessment purposes) will in most instances reduce the overall litigation budget.  It’s the classic, “you can pay me now, or pay me later” situation.

Counsel must understand that while some costs are incurred early in the process the benefits are crystal clear: i.e., determining customized case strategies early in the matter to decide whether to fight or settle.  Similarly, corporate clients must recognize that the benefits outweigh the costs and require their litigation counsel to include this process in every significant matter.

This illustration highlights how an initial ECA investment actually pays for itself over the life of the litigation.


Myth #5: ECAs Begin when the Complaint is Filed

Many newbie ECA practitioners may think that the timing for an ECA approach would start when the complaint is filed.  And, while this isn’t patently ridiculous, I think the better approach is to begin the clock at the time litigation becomes “reasonably likely” — versus later dates such as when the complaint is filed or when discovery is propounded.  This trigger is also the same for trigger preservation obligations and a host of interrelated activities such as ESI “identification,” which makes the matter kick-off more synchronized.

For more information about ECA, watch a recording of our recent webinar — E-Discovery MythBusters: Debunking Common Myths About Early Case Assessment.

Clearwell Expands Its E-Discovery Platform with New Modules for Pre-Processing, Review, and Production

Monday, August 17th, 2009

Earlier today, Clearwell announced Version 5.0 of its e-discovery platform. Unlike prior versions which focused on processing, early case analysis, and first-pass review, this release extends Clearwell’s capabilities in two directions: upstream, by adding pre-processing; and downstream, by adding document-by-document review and production. I wanted to say a few words about what motivated these changes, and why the new release greatly increases Clearwell’s value to enterprises, government agencies, law firms, and litigation support service providers.

Over the past year, the benefits of early case analysis and first pass review have driven hundreds of companies to adopt Clearwell. They have saved huge amounts of money and time, and often become evangelists for the product. But despite that, we continually hear that the overall e-discovery process remains expensive, unpredictable, and risky. When we investigated why, we found the problem lies less in the features of the products being used than in the number of products used.

Once data is collected, a typical e-discovery process today may involve as many 4 different tools: one for filtering by custodians or date range, another for de-duplication and keyword search, another for load file creation, and yet another for review and production. Each time data moves between these tools, and there’s a handoff from one to another, there’s the risk that document counts do not tie out, data does not convert correctly, or any of a hundred other things go wrong. This risk is magnified by the fact that e-discovery is highly iterative: custodians are often added or keywords changed as new information comes to light, forcing people to redo many steps of the process. As a result, timelines are unpredictable and it’s hard to stick to a budget, even with extensive project management which itself is not cheap.

Since the problem lies in the handoffs between different products, it’s impossible to solve this problem by making any one part of the process better. The only solution is to have a single product that can manage collected data from soup (filtering / pre-processing) to nuts (production). Prior to today’s announcement, that product did not exist: there was no single, integrated product that could do everything from process data to review and produce it. And that, in summary, is why Clearwell is releasing Version 5.0.

With Clearwell’s new product, there are no handoffs, no uncertainty about how long it will take to export out of one tool and into another. There’s no need to cobble together a string of different products or train lawyers on multiple different interfaces and workflows. As a result, the risks of cost overruns or missed deadlines are greatly reduced.

To our mind, this is just part of a natural evolutionary process that affects many markets, not just e-discovery. Who wants to carry a Palm Pilot, iPod, and a mobile phone when you can carry a single device like the iPhone? Who wants a cable receiver and a TiVo when you can get both in a single set-top box?  As markets mature, there develops a logical package of functionality that customers prefer to buy from a single, integrated provider.

You can sign up for a product demonstration at our website, or come see the product at ILTA next week (Booth 606). Take a look – and let us know what you think.

Cutting Through The Confusion: A Buyer’s Guide To Electronic Discovery Software

Sunday, April 19th, 2009

Over the past 4 years, I have had hundreds of conversations with corporate counsel and “legal IT”, meaning technical folks charged with supporting the legal team. More and more of them are looking to lower their costs by bringing e-discovery in-house. But as they work through that process, there’s one question that consistently comes up, even today – namely, “When [insert name of software company] says they “do” e-discovery, what exactly does that mean?”

There has been progress towards answering this question, thanks mainly to the analyst community. George Socha and Tom Gelbmann’s EDRM framework has been immensely helpful in breaking down electronic discovery into its component steps. Other analysts, like Debra Logan at Gartner, were quick to embrace the framework, prompting every software provider to follow suit. As a result, there is today a common language that everyone uses to describe the e-discovery process.

The Electronic Discovery Reference Model (EDRM) breaks down the e-discovery process into a series of steps. Companies looking to buy e-discovery software to lower costs typically map different software products to each of these steps, to make sure that they cover the entire process.
The Electronic Discovery Reference Model (EDRM) breaks down the e-discovery process into a series of steps. Companies looking to buy e-discovery software to lower costs typically map different software products to each of these steps, to make sure that they cover the entire process.

But having a universally-agreed framework is only half the answer. To eliminate customer confusion, there also needs to be agreement on how different software products fit into the framework. This is especially important since there is no single, end-to-end solution for e-discovery which covers all aspects of EDRM. So customers are forced to think about how different software solutions fit together. And that is where things begin to fall apart.

Many software vendors feel it is advantageous to claim that they do everything, even though they do not. Customers are rightly suspicious of those claims, and so press vendors to provide more detailed information – hence the question, “when you say you do e-discovery, what exactly does that mean?”

In light of that, how can litigation support teams, corporate counsel, or legal IT people figure out which e-discovery solution best meets their needs? From observing this decision-making process hundreds of times, I have found 3 simple steps are incredibly helpful.

Step 1: Read the analyst reports

Two reports in particular make for required reading. One is Gartner’s MarketScope Report, which is available for free at certain sites; the other is the 451Group’s recent e-discovery report, which is summarized in a publicly available presentation. The helpful thing about the 451 Group’s report is that it tells you which software companies do which parts of the EDRM process. You do have to buy the report to get the full picture (it’s well worth it!), but the publicly available presentation will give you a flavor for their analyis, and I have drawn from that presentation in the figure below:

Analyst firms like the 451 Group map software vendors to the EDRM framework according to what they actually do, which is often different from what software vendors claim they do.
Analyst firms like the 451 Group map software vendors to the EDRM framework according to what they actually do, which is often different from what software vendors claim they do.

The 451 Group’s analysis highlights several important points. First, it shows that there is no single end-to-end solution. Even the products of giants like EMC (SourceOne), HP (IAP), and IBM (CommonStore) only solve one piece of the puzzle, information management. Second, it shows that customers have choices at each stage of the EDRM process. For example, to solve the problem of identification, collection, and preservation of electronic information, customers can choose from solutions as diverse as Guidance EnCase (forensic collection), Index Engines (back-up tapes) and Mimosa NearPoint (email archive). Third, it provides an independent assessment of what vendors do, as opposed to what they may claim. For example, Kazeon claims analysis and review capabilities, whereas the report shows its product does identification, collection, and preservation; Recommind claims its Axcelerate eDiscovery and MindServer products do processing, whereas the report finds that they do not.

Step 2: Evaluate the products prior to purchase

Just as anyone would test-drive a car prior to purchase, it’s critical to test-drive e-discovery software. Any vendor should be willing to provide their software free of charge for an evaluation on-premise. The most effective evaluations are when the customer uses the product themselves, either on a live case or test data. This is far preferable to just sending the data to the vendor who then loads it into their system, as in that scenario there are too many opportunities for the vendor to hide their product’s shortcomings.

Step 3: Check references carefully

The trick with references is to insist on relevant references. It’s not good enough for the vendor to dredge up some random person who says nice things; or even a credible knowledgeable person who is using the product in a completely different way. For example, if a company is happy with Autonomy’s IDOL for enterprise search, that does not tell you much about what Autonomy might be like for e-discovery. What really counts are references from other customers who are using the product for the same application that you are.

All this can sound like a lot of work, but I have seen people go through the process in as little as a month, and be much happier for it. A little work up front can save a lot of time (and heart-ache!) later on.

Federal Rule of Evidence 502: Help or Hype?

Thursday, November 13th, 2008

There’s a lot of excitement (and corresponding uncertainty) about the recent passing of Federal Rule of Evidence 502 (FRE 502), which was signed into law on Sept 19th.  The main reason that the legal community is excited about FRE 502 is because of the potential for cost savings by reducing the amount of money associated with the e-discovery review process, which is routinely viewed as the most expensive area in the entire e-discovery process.

In combination with the codification of a national standard to determine when a privilege has been waived, FRE 502 is primarily designed to make the use of claw-back agreements a truly viable prospect when doing e-discovery privilege review.  It should provide some panacea (ideally) for rapidly escalating e-discovery costs.  Or, at least that was the impetus behind the rule’s creation – according to the Comments:

“The proposed new rule facilitates discovery and reduces privilege-review costs by limiting the circumstances under which the privilege or protection is forfeited, which may happen if the privileged or protected information or material is produced in discovery. The burden and cost of steps to preserve the privileged status of attorney-client information and trial preparation materials can be enormous. Under present practices, lawyers and firms must thoroughly review everything in a client’s possession before responding to discovery requests. Otherwise they risk waiving the privileged status not only of the individual item disclosed but of all other items dealing with the same subject matter. This burden is particularly onerous when the discovery consists of massive amounts of electronically stored information.”

In short, FRE 502 is designed to establish uniform, nationwide standards for waiver of attorney-client privilege and work product protection, with the main goal being to protect producing parties against the inadvertent disclosure of privileged materials or work product in either federal or state proceedings.  The salient section is subsection (b) which states that when a disclosure of privileged information is made in a federal proceeding or to a federal agency, the disclosure does not constitute a waiver if:

  1. the disclosure is inadvertent;
  2. the holder of the privilege or protection took reasonable steps to prevent disclosure; and
  3. the holder promptly took reasonable steps to rectify the error, including (if applicable) following Federal Rule of Civil Procedure 26(b)(5)(B).

The end game here is presumably to increasingly leverage automated review methodologies to save costs.  But, in order to facilitate this type of review methodology without taking on unhealthy levels of risk means that claw-back provisions must be as airtight at possible to prevent inadvertent electronically stored information (ESI) productions.  And yet, exactly how FRE 502 will work in practice is up to debate since there isn’t any case law interpreting it yet.

One area that’s top of mind is how this new Rule will impact the recent decisions on e-discovery search, including the Victor Stanley case authored by Chief Magistrate Judge Grimm.  Since FRE 502 contains a core “reasonableness” prong in section (b) it’s likely that Grimm’s proclamation about e-discovery search will still be controlling.  Grimm fundamentally had to evaluate whether the producing party’s search protocols and procedures were in fact reasonable.

“Defendants, who bear the burden of proving that their conduct was reasonable for purposes of assessing whether they waived attorney-client privilege by producing the 165 documents to the Plaintiff, have failed to provide the court with information regarding: the keywords used; the rationale for their selection; the qualifications of M. Pappas and his attorneys to design an effective and reliable search and information retrieval method; whether the search was a simple keyword search, or a more sophisticated one, such as one employing Boolean proximity operators; or whether they analyzed the results of the search to assess its reliability, appropriateness for the task, and the quality of its implementation.” (footnotes omitted).

In Victor Stanley, the producing party wasn’t able to demonstrate reasonableness because they didn’t strategically craft out their strategy nor conduct any sampling to make sure that the e-discovery search worked as designed.  This type of analysis would still seem to come into play under FRE 502 and so, as Grimm states, the use of either a best practices or collaborative approach to e-discovery would seem to be as important as ever.

Given that backdrop it’s just as important as ever that parties “show their work” when it comes to e-discovery search.   Whether FRE 502 will really make parties feel safe enough to use automated review processes (thereby reducing costs) will remain to be seen.  But, this first step which unifies standards and expectations is at least a very positive step.

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in e-discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

Demystifying Concept Search in Electronic Discovery

Tuesday, October 28th, 2008

Concept or content search continues to be a hot topic within the e-discovery community.  There’s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  Judge Facciola discusses concept search in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in e-discovery.   It’s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.

The “Concept Search is Concept Search” Myth

The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: concept search and concept categorization.  It’s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.

So, what are the differences between concept search and concept categorization?  First, let’s start with concept search.  Concept search technologies find documents containing “concepts”.  I think that the Sedona Conference’s “Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery“, provides a good definition of “concept” when used in a search context: “the combination of [a] query term and the additional terms identified by the thesaurus.”  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.

Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not “find” documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona commentary has a good summary of these different technologies if you are interested in learning more.

As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It’s why it’s not surprising that a lot of the users of e-discovery services and software don’t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I’ll discuss this in my next blog post.

Opening Moves in E-Discovery

Friday, September 19th, 2008

I was recently asked: “what are the first things you do when your client calls you about a case requiring e-discovery?”  So, for the benefit of all, I’ll post my answer.

My first caveat to the advice was context.  Since, while a lot of attorneys have attended CLEs or have read about e-discovery, it’s not the same in the real world.  As the old Spanish Proverb goes:

It’s not the same to talk of bulls as to be in the bullring.

Keeping in mind that reality may differ significantly from academics, here are some things to consider when the next e-discovery case comes up.   Please also keep in mind that these steps (like the EDRM workflow) aren’t linear and may in fact occur cyclically or in parallel:

1. Preserve, preserve, preserve

Nothing is more important than meeting the initial preservation obligation, which begins when litigation is “reasonably likely” – as opposed to just when the complaint is filed.  This first step in the long journey can easily be a trap for the unwary/unprepared.

The challenge once you’re past the trigger issue is to then identify the boundaries of the duty to preserve, i.e., what evidence must be preserved?   This inquiry is often initially comprised of identifying key players, date ranges and data types.

Another significant challenge in this step is to monitor and update the legal hold process.  And, given that litigation more often than not spans years, it’s easy to initially succeed at the preservation effort, but then later fail on execution.  The best way to minimize risk in this step is to move quickly from preservation to collection.  See Is Preservation in E-Discovery Overrated?

2. Work backwards

Once preservation (and ideally collection) is adequately covered, the next step is to start thinking about the end of the process and what success (or lack of failure) looks like.  The exposure and profile of the matter are important to consider when you embark upon an e-discovery project since it’s critical to scale discovery efforts appropriately.

One thing, in particular, that is very important to consider early in the process is the type of production format that will be preferred by reviewing counsel and the opposition.  TIFF-based image productions (which are historically well accepted) are often pitted against native file ESI reviews.  Either format may or may not be acceptable given the situation and the applicability of FRCP Rule 34.

3. Understand the technical landscape

Most attorneys, but for a rare few, aren’t capable of really comprehending technical nuances of the complex and interrelated IT systems found at most Fortune 2,500 enterprises.  Fortunately, they are quite adept at working with experts (either consulting or testifying) to help them get to the bottom of difficult to comprehend and explain issues.  The key is find the right technical people who understand IT systems and who can explain it to judges, juries, and attorneys alike, especially for some of the most common ESI repositories like: email servers, archival systems, shared network drives, instant messaging servers, archival repositories (e.g., tape libraries, real time back-up systems, etc.), records management systems, knowledge management systems, proprietary, but highly leveraged, internal applications, offsite repositories (e.g., hosted IT or email systems) and significant partner or subsidiary data stores.  In many instances it will make sense to leverage or create a map of the data universe so that nothing is missed and inaccessibility arguments can be cogently detailed.

4. Get your lingo straight

Assumptions, whether in e-discovery or not, are often dangerous.  In the complex undertaking where multiple parties are handling ESI it’s critical to make sure that everyone is on the same page especially since every company handles IT, records management, ILM and information security differently.  So, when working with these disparate constituents the outset of an engagement is the right time to make sure everyone is on the same page.  Therefore, standardize on a set of commonly used terms. Examples of potentially ambiguous topics include “imaging” ,“archive”, and “records.”

5. Don’t assume your client will really be helpful

I’ve been involved with hundreds of e-discovery engagements and I’ve found that almost universally the end client professes a profound willingness to help out.  And yet, actual “help” is relatively rare.  To qualify this, it may be prudent to ask several additional questions:

  • Does the Client have the time to actually help?  Everyone at the client’s site has a day job that they’re tasked with above and beyond transient e-discovery needs.  So, while bandwidth generally is important, what’s more critical is the ability to comply with aggressive judicial deadlines.
  • Are the people helping the ones you’d want to see on the stand?  It’s often not realistic to have internal folks (especially IT and Records Managers) stay isolated during the various pre-trial events – meet & confer conferences and potentially 30(b)(6) depositions so it’s important to evaluate how a given witness will fare when providing testimony.
  • How likely is it that you client would throw you under the bus if things went wrong?  In my opinion, there is now more reason for outside counsel to manage the risks of an e-discovery project going awry.  See, Sullivan and Cromwell’s suit against EED.  Some will wisely bring in 3rd party consultants/experts to have a neutral, unbiased constituent in the process.

6. Build a budget and team (internal/external)

Everyone is probably now aware of how expensive e-discovery can be if managed improperly.  This makes it all that more imperative to work quickly to get a rough sense of the scope (which will lead to a budget) and the client’s willingness to absorb associated charges.  The most important step is to right-size the e-discovery effort with the risks inherent in the corresponding litigation/investigation.  Otherwise, there’s a high likelihood that e-discovery process will be over-engineered (too expensive) or under-scoped (cutting dangerous corners).

7. Figure out your risk profile

Similar to right-sizing the budget, it also makes sense to adopt a “horses for courses” approach to e-discovery since there is no singular way to handle a given matter.  For example, in one case you make take forensic images, restore backup tapes, capture instant messaging data, harness metadata, or decide to do an automated review with a with a “clawback” provision. In either case, the only mistake is to assume that an approach from another, dissimilar matter is warranted in the instant case.

8. Assume the opposition is better informed than you are

While this actually may not be the case, it’s a safer bet that assuming a level of naiveté that may not exist.  What is certain is that the Plaintiff’s bar is increasingly well informed and can be very aggressive.  They’ve seen the playbook that calls for baiting the opposition into a discovery misstep that can result in significant, case altering sanctions.  According to a recent survey, 63% of the polled attorneys said that e-discovery is being abused by counsel, so it’s important to be wary initially.

It’s also important to consider the potential reciprocity of a given matter and adjust your position accordingly.  In many instances it’s easy to consider your role only as a producing party, but with cross/counter claims it may be possible to simultaneously be propounding discovery and in the opposition’s shoes.

9. Prepare for an early case assessment

A recent industry survey found that effective early case assessment (ECA) approaches reduced overall litigation in half of the cases evaluated, and resulted in favorable outcomes for 76 percent of the cases.   The key to this methodology is to use the available next generation case analysis solutions earlier in the process, not just to review data for relevancy and privilege, but to:

  • Identify the key players. This is critical in order to have a defensible legal hold process
  • Evaluate the posture of the case to determine how it looks on the merits
  • Diagnose potential outliers in the e-discovery process to facilitate meet and confer discussions and help create “inaccessibility” arguments
  • Conduct a search term analysis for keyword negotiations during meet and confer discussions.  Objectively demonstrating the results of proposed search queries can go a long way in speeding up keyword negotiations

10. Don’t take search for granted

For many attorneys, e-discovery search is just like Lexis or Google.  Unfortunately, that isn’t the case.  Instead, it’s become highly complex and is now receiving significant judicial scrutiny.  In Victor Stanley v. Creative Pipe Judge Grimm suggested that attorneys need to rethink how they’ve traditionally managed the search process:  “[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”  It’s now important to devise (and share at early meet & confer conferences) a defensible search strategy that can withstand judicial scrutiny.

Judge Grimm, Victor Stanley, And The Problem Of “Black-Box” E-Discovery Search

Friday, August 22nd, 2008

Judge Paul Grimm’s recent opinion in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008) provides valuable guidance on one of the most important issues in e-discovery: how to conduct keyword searches in a defensible manner given that keyword searches are prone to produce over- and under-inclusive results.  The ruling suggests one of two approaches: either producing parties should adopt a “collaborative” approach to conducting keyword searches, whereby each party agrees on a search methodology; or, they should use a “best practices” approach, such as the one suggested by Sedona, where the producing party tests, samples, and iteratively refines searches so that they can demonstrate they have taken reasonable measures to reduce over- and under-inclusive results.

While the guidance is clear, following the guidance in practice is very difficult.  The primary reason for this is that the search technology being used in e-discovery today is not up to the task.  Specifically, today’s search technology suffers from three problems:

  1. The over- and under-inclusive tradeoff. Many technologies have been developed to address the tendency of keyword searches to miss relevant documents and produce under-inclusive results.  Wildcard and stemming technology has been developed in order to address the issue of finding common word variations in specified keywords.  Concept search has been designed to find documents containing words with similar meanings to the keywords in a search.  And fuzzy search technologies have been put in place to find misspellings of words. However, all of these suffer from the same problem: they produce too many non-relevant or “false positive” documents thus driving up the cost of review. For example, if someone runs the wildcard search “divers*”, then he or she not only gets the desired documents containing “diverse” and “diversity”, but also gets a large number of false positive documents containing “diversion”, “diversification”, and so on.  In the case of concept and fuzzy search, the problem is so great that these technologies to date have rarely been used in e-discovery.
  2. Too expensive to test, sample and refine searches. Today’s search technologies are largely designed to run one search at a time, not the dozens of searches that are typical in e-discovery. As a result, anyone trying to follow the best practices of testing, sampling, and refining each search will find themselves missing deadlines and running over budget because it takes so long. This also makes collaboration with the opposing party close to impossible, since there’s little time to iterate on – and agree upon – a set of keyword searches.
  3. Manual documentation. It’s not enough for producing parties to use best practices, they have to document them so that they can “show their work” to the court. Currently, documenting the search refinement process is mostly manual, with the result that it is either done inadequately or not at all.

The reason why the search technology used for e-discovery has these problems is surprisingly simple: it’s because the technology was not designed for e-discovery in the first place. Rather, it was built for enterprise search, and was only later repurposed towards e-discovery.

The “Black Box” Of Enterprise Search

The core issue is that enterprise search technology has been designed to be a “black box”. Users enter a single search query into one end, and get results at the other, with no visibility into what happens in between. Going back to our previous example, when a user searches for “divers*” intending to find documents related to “diversity” or “diverse”, enterprise search engines give the user no visibility into the crucial step of query expansion and how it expands the search query into relevant and non-relevant terms like “diversion” and “diversification”. As a result, the user has no ability to minimize the false positives.

In the same vein, when a user enters multiple queries into a “black box” enterprise search engine, all of the queries run as a single search, and the user has no visibility into which results are associated with which query. For example, a user that searches for “hiring OR interview” will get the results for the combination of the queries “hiring” and “interview”. He or she won’t know that only 5 of documents contained “hiring” while 100 documents contained “interview.”  This limitation makes analyzing, sampling and refining searches costly and time consuming.

That’s not say that enterprise search products like Autonomy or Endeca are flawed. Far from it.  Their “black box” design works exceedingly well for the simple and quick queries that people want to run across the enterprise for general business purposes. If a sales manager is looking for a single proposal for her meeting the following day, then she doesn’t care how the search was performed or if it’s over-inclusive.  She’s only interested in the first page of relevant results, and for that use case enterprise search engines do a great job.

But e-discovery is a whole different world.  In e-discovery, users typically must review every single document in the search results, not just the most relevant ones.  As a result, over-inclusive searches can dramatically increase the costs of downstream production and review.  And under-inclusive searches raise the issue of defensibility.  Finally, e-discovery users have to run a lot of search queries and understand which documents are associated with each of those queries.

So, going back to the original problem, if current search technologies cannot help lawyers and litigation support professionals follow Judge Grimm’s guidance and address the “well-known limitations” of keyword search, what can? That will be the subject of my next post.

Review-less E-Discovery Review

Monday, July 21st, 2008

terminator.jpgMost science fiction visions of the distant future seem to contain a rather singular fear: that the human race will be taken over by computers.  Think “Terminator” series, preferably without the naked Arnold Schwarzenegger visual.  Regardless of whether this vision fills you with trepidation or excitement there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.

For past several decades, attorneys have had to use litigation software to manually review information for relevancy and privilege in response to the e-discovery process.  Quoting from Information Inflation: Can the Legal System Adapt? by George Paul and Jason Baron, this task has always been viewed as sacrosanct “because of ‘death penalty’ waiver doctrine that evolved long ago when information was still manageable.”

Like so many industries, the legal profession has attempted to grapple with the transformation that the digital revolution has brought to the forefront.  The latest revisions to the Federal Rule of Civil Procedure (FRCP) is the most obvious case in point.  And yet, electronically stored information (ESI) is proving difficult to fit into traditional, even remodeled, paradigms.  Even ignoring (for the moment) the proliferation of novel data types (i.e., blog content, voice over IP or VOIP, webmail, text messaging, web services, etc.) the amount of data that attorneys are being required to review during litigation discovery has reached a tipping point of review feasibility.

Back in the day, information was viewed in terms banker boxes of information, and even in the most document intensive discovery matters this measuring stick belied the belief that armies of attorneys could conceivably conquer the massive document review problem.  But now, we often see clients that process routine matters containing terabytes of information.  Most of us in the electronic data discovery space have become numbed to the abstract nomenclature of megabytes, gigabytes, terabytesi, petabytesii, and in the process we may have failed to realize that we have moved well beyond the scale of information that can be reasonably attacked with even the largest armada of contract attorneys (assuming that the client could conceivably bear the astronomical costs).

“At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later.”iii

I’m certainly not the first to point out that this tipping point is coming, but now we are really starting to see early adopters respond to this sea change. In their linked article above, George Paul and Jason Baron state “It is no exaggeration to say that litigation, as we have known it, is threatened by information’s new hyper-flow. The amount of electronically stored information relevant to a case is already a stress point in litigation.  […]  Litigators can no longer depend on manual review alone….”

Up until now, attorneys and the clients that are footing the litigation discovery bill have had to make a Hobson’s choice:  either “force parties to continue hugely expensive privilege reviews, or to forego the attorney-client privilege or work-product privilege altogether.”   But, now it appears that another way is evolving.

The following lays out a scenario where a non-manual review methodology may make sense.  ***Please note: this approach is not without risk.  At this moment in time neither clawback provisions, the potential adoption of Evidence Rule 502 nor any other know prophylactic measure can completely insulate a producing party from the unforeseen consequences of an inadvertent disclosure.  But, as they say, desperate times call for desperate measures….

Step one: Evaluate the Environment

The following factors represent some of the elements that should be taken into consideration prior to skipping the normal, human based review steps that are seen in most e-discovery matters.

  1. Large data set.  This may sound a bit obvious, but a non-manual approach is best suited for large, unwieldy data sets.  The corpus doesn’t need to be in the terabytes, but the data set should be evaluated in term of discovery processing costs and attorney review estimates.
  2. Short Production Timelines.  Once the above calculations are conducted, the next step is to determine if a human based review could even conceivably be conducted in the given time frame.  In many instances, an eyes-on review process just won’t be feasible since there won’t be enough bodies to throw at the problem.
  3. Next Gen “PAR” Tools.  In order to pull this “review-less” review process off, both safely and quickly, the responding party needs to have access to fast, robust processing, analysis and review (“PAR”) tools.  Certainly, it’s possible to have this scenario work with an e-discovery service provider, if they have the capability.
  4. Relatively Small Amount in Controversy.  For the time being, this approach should not be considered for any “bet the company” litigation, nor anything with significant downside risk (governmental inquiries, punitive damages, class actions, 2nd requests, etc.).  Yet, for many standard commercial lawsuits, corporate investigations, HR claims, etc. this review-less approach may be worth considering.
  5. Ability to Use a Clawback Provision.  Entering into a clawback provision with the opposition is mandatory in this methodology since the chances of an inadvertent production are statistically ever-present.  Yet, until Evidence Rule 502 is resolved, there will always be a risk that the clawback won’t be enforceable against 3rd parties.
  6. Non-governmental Production.  Most information in governmental productions becomes part of the public record, meaning that a clawback isn’t going to be feasible.  Here, trade secret information, personally identifiably data and the like would be disastrous if pushed out into the public domain.

Step two: Perform a Risk/Benefit Analysis

Next, take all the above factors into consideration and determine if the risks (of inadvertent production, the clawback being ineffective, etc.) are worth the benefits (reduced costs, lower attorney review fees, ability to meet deadlines, etc.).

Sure this is hard work, but the alternative (manual review) is more ephemeral than realistic.

[In my next post, I’ll address the tactical steps to conduct a review-less review process.  Stay tuned……]

i One terabyte is generally estimated to contain 75 million pages and could conceivably cost $18,750,000 to review.  Anne Kershaw, Automated Document Review Proves Its Reliability, 5 DIGITAL DISCOVERY & E-EVIDENCE 11 (2005).

ii According to Wired, we’re now in the “Petabyte Age” where that amount of data is processed by Google’s servers every 72 minutes.

iii Wired article, above.