Archive for the ‘e-discovery software’ Category

E-Discovery In The Press

Thursday, October 2nd, 2008

Last month, for the first time, friends of mine who do NOT work in the legal industry starting talking to me about e-discovery. In the past, they had always taken on the glazed look of a bored 8th-grader whenever I spoke about what I do. But suddenly, they were strangely interested and full of questions.

The reason was two articles about e-discovery in the mainstream media which appeared within a week of each other. The first was in the Wall Street Journal, which wrote about how tech firms are at war with lawyers. According to the Journal, the fact that companies are saving money by using e-discovery software is bad news for lawyers, since they are “facing the loss of lucrative client fees.” In response, the lawyers are fighting back: “The attorneys counter that there are pitfalls to replacing them. Early this year, a federal judge required chip maker Qualcomm to pay rival Broadcom more than $8 million after it failed to uncover and share emails relevant to a case.”

I am sure there are lawyers who see technology as a threat, but the firms I deal with are actively embracing e-discovery technology, not fighting it. They see it as another way they can add value to their clients, and would prefer to have their staff focused on practicing law, not mindlessly reading irrelevant documents. So I ended up spending a lot of time explaining to my non-legal friends that there are two sides to the coin. As for my friends who do happen to be lawyers, they focused on the Qualcomm case, pointing out (as we have written before) that the problem was not technology, but rather poor processes and bad judgment on the part of the attorneys concerned.

The second article appeared in the Economist and took a different tack. It argued that the stratospheric cost of e-discovery is gumming up the court system and preventing justice from being served. According to one former justice from Colorado quoted in the article, even mundane landlord-tenant disputes “are now digital wars of attrition”; there are “cases that are settled only because one party cannot afford the costs of e-discovery”; and, many “plaintiffs cannot afford to sue at all, for fear of the e-discovery costs.”

I love the Economist’s tongue-in-cheek style and thought the article made many valid points. My one disappointment was that its spin was unequivocally negative, as though e-discovery is a self-inflicted wound on the American judicial system. Nowhere was there mention of the fact that electronic evidence often helps litigants get at the truth. Rather than incomplete recollections or “he said-she said” claims and counter-claims, there’s no disputing an email that captures a person’s words and actions in black-and-white. Nor was there any mention of how technology is solving the problems that it inadvertently created: today, there are many products that rapidly sift through electronic information, dramatically lowering the cost of e-discovery.

It is great for everyone in the e-discovery community for our domain to get more ink in mainstream, quality publications. I expect that the trend will continue as the industry grows, and especially once the investigations start into our current financial meltdown.

“Aggressive Culling”: The E-Discovery Buzz Cut

Tuesday, September 30th, 2008

Ralph Losey, never one to mince words, recently analyzed a recent litigation survey from the elite Fellows of the American College of Trial Lawyers. The survey highlights the fact that one of the main problems facing the U.S. legal system today is (surprise!) e-discovery. Also (not) a surprise is that the study “places the blame squarely on poor rules, bad law, and judges”, while overlooking the role that lawyers play in the problem.

In his analysis, Ralph makes a number of insightful observations that should help lawyers move from being e-discovery troublemakers to being part of the solution. However, one of his key critiques is targeted not at lawyers but rather at the vendor community: “[E-discovery] is too expensive because lawyers and judges do not know what they are doing, and do not know how to properly cull and review email, and because clients are disorganized pack-rats. Many of the e-discovery vendors are also misinformed, but often they do know better; they just have no pecuniary interest in aggressive culling. Some may even seek to line their own pockets in inflated discoveries.”

As Ralph bluntly points out, pecuniary interests (translation: money) plays a big role here, but so does risk reduction. Imagine you’re given the opportunity to process a 2 terabyte case all the way through to review. With the “funnel” of e-discovery costs placing the highest dollar per gigabyte value on the end of the process (i.e. review), what’s your incentive to cull aggressively at the beginning? Not much from a revenue perspective, certainly, but also not much from a risk perspective: particularly when you have sanctions and lawsuits on your mind and are thinking about the potential liability that you incur by excluding potentially relevant documents by using too broad a brush (or pair of garden clippers) in your pruning.

How do we move forward? As document volumes continue to grow, it’s clear that aggressive culling (with a few caveats which we’ll get to in a minute) is a critical tool for managing costs and improving case outcomes (let’s go out on a limb and define “improving” as producing fairer and more equitable rulings). However, in order to adopt more aggressive culling as a standard part of the electronic discovery process, the community has to come to terms with three things:

  • The Myth of Perfection: There may be perfect abs, but there is no perfect e-discovery. Organizations like the E-Discovery Institute are doing fantastic work to measure and improve the accuracy of electronic discovery efforts, but in the end it’s tough to make the argument that having 100 contract attorneys manually reviewing 10 million documents will necessarily produce a better overall e-discovery outcome than  10 specialized attorneys reviewing 200,000 documents that were aggressively (but thoughtfully) culled from initial 10 million document set. There simply is no black and white set of rules that will lead to a perfect process.
  • The Benefit of Cost Control: Given that, it is in the best interest of everyone involved (yes, even vendors) to choose the most cost-effective process that provides a high likelihood of producing the information relevant to the case.  This means “saving your bullets” by not spending all of your e-discovery dollars up front in a case pursing the perfection myth, but instead approaching discovery in an incremental fashion which can adapt to changing facts and circumstances as the matter unfolds. How, you may ask, do vendors benefit? They can become more strategic e-discovery advisors by working with counsel over the full lifecycle of a case, providing higher-value (and, by the way, more interesting and intellectually challenging) consulting services to help incrementally adjust and adapt the course of e-discovery. As Ralph puts it: “…Trial lawyers should accept that specialists in the field of e-discovery are a necessary evil. If an e-discovery specialist knows the field, they can save you money and take you out of the e-discovery morass faster and more reliably than a dozen new rules. The world today is too complex for one man or woman to do it all.”
  • The Value of Defensibility: Many of you likely winced at the term “high likelihood” in the previous point. “Sacrilege!” you cried. “I demand certainty!” First, go back and re-read the first point about the Myth of Perfection. Then, consider that a better way forward may be an approach to e-discovery that involves more aggressive culling early in the process to focus on the most important documents first, more iterations to adapt to changing facts and circumstances, and, all along the way, a complete audit trail that provides defensibility in the event that any aspect of the process is ever questioned. Such defensibility would include specific documentation about the culling decisions that were made, down to the keyword and “sub-keyword” (i.e. wildcard expansion) level, so all the cards are on the table for everyone to see.  The value of defensibility when performing aggressive culling is enormous, in that it adds an additional measure of safety and trust to the process, minimizing the amount of doubt and second-guessing that so often plagues e-discovery negotiations.

By coming to terms with the fundamental imperfections of the e-discovery process and embracing the promise of lower costs and the agility and responsiveness that can be gained with a more iterative approach, everyone stands to gain from the safe and controlled adoption of aggressive culling – yes, even the vendors (at least the smart ones) and their ever-present pecuniary interests.

Opening Moves in E-Discovery

Friday, September 19th, 2008

I was recently asked: “what are the first things you do when your client calls you about a case requiring e-discovery?”  So, for the benefit of all, I’ll post my answer.

My first caveat to the advice was context.  Since, while a lot of attorneys have attended CLEs or have read about e-discovery, it’s not the same in the real world.  As the old Spanish Proverb goes:

It’s not the same to talk of bulls as to be in the bullring.

Keeping in mind that reality may differ significantly from academics, here are some things to consider when the next e-discovery case comes up.   Please also keep in mind that these steps (like the EDRM workflow) aren’t linear and may in fact occur cyclically or in parallel:

1. Preserve, preserve, preserve

Nothing is more important than meeting the initial preservation obligation, which begins when litigation is “reasonably likely” – as opposed to just when the complaint is filed.  This first step in the long journey can easily be a trap for the unwary/unprepared.

The challenge once you’re past the trigger issue is to then identify the boundaries of the duty to preserve, i.e., what evidence must be preserved?   This inquiry is often initially comprised of identifying key players, date ranges and data types.

Another significant challenge in this step is to monitor and update the legal hold process.  And, given that litigation more often than not spans years, it’s easy to initially succeed at the preservation effort, but then later fail on execution.  The best way to minimize risk in this step is to move quickly from preservation to collection.  See Is Preservation in E-Discovery Overrated?

2. Work backwards

Once preservation (and ideally collection) is adequately covered, the next step is to start thinking about the end of the process and what success (or lack of failure) looks like.  The exposure and profile of the matter are important to consider when you embark upon an e-discovery project since it’s critical to scale discovery efforts appropriately.

One thing, in particular, that is very important to consider early in the process is the type of production format that will be preferred by reviewing counsel and the opposition.  TIFF-based image productions (which are historically well accepted) are often pitted against native file ESI reviews.  Either format may or may not be acceptable given the situation and the applicability of FRCP Rule 34.

3. Understand the technical landscape

Most attorneys, but for a rare few, aren’t capable of really comprehending technical nuances of the complex and interrelated IT systems found at most Fortune 2,500 enterprises.  Fortunately, they are quite adept at working with experts (either consulting or testifying) to help them get to the bottom of difficult to comprehend and explain issues.  The key is find the right technical people who understand IT systems and who can explain it to judges, juries, and attorneys alike, especially for some of the most common ESI repositories like: email servers, archival systems, shared network drives, instant messaging servers, archival repositories (e.g., tape libraries, real time back-up systems, etc.), records management systems, knowledge management systems, proprietary, but highly leveraged, internal applications, offsite repositories (e.g., hosted IT or email systems) and significant partner or subsidiary data stores.  In many instances it will make sense to leverage or create a map of the data universe so that nothing is missed and inaccessibility arguments can be cogently detailed.

4. Get your lingo straight

Assumptions, whether in e-discovery or not, are often dangerous.  In the complex undertaking where multiple parties are handling ESI it’s critical to make sure that everyone is on the same page especially since every company handles IT, records management, ILM and information security differently.  So, when working with these disparate constituents the outset of an engagement is the right time to make sure everyone is on the same page.  Therefore, standardize on a set of commonly used terms. Examples of potentially ambiguous topics include “imaging” ,“archive”, and “records.”

5. Don’t assume your client will really be helpful

I’ve been involved with hundreds of e-discovery engagements and I’ve found that almost universally the end client professes a profound willingness to help out.  And yet, actual “help” is relatively rare.  To qualify this, it may be prudent to ask several additional questions:

  • Does the Client have the time to actually help?  Everyone at the client’s site has a day job that they’re tasked with above and beyond transient e-discovery needs.  So, while bandwidth generally is important, what’s more critical is the ability to comply with aggressive judicial deadlines.
  • Are the people helping the ones you’d want to see on the stand?  It’s often not realistic to have internal folks (especially IT and Records Managers) stay isolated during the various pre-trial events - meet & confer conferences and potentially 30(b)(6) depositions so it’s important to evaluate how a given witness will fare when providing testimony.
  • How likely is it that you client would throw you under the bus if things went wrong?  In my opinion, there is now more reason for outside counsel to manage the risks of an e-discovery project going awry.  See, Sullivan and Cromwell’s suit against EED.  Some will wisely bring in 3rd party consultants/experts to have a neutral, unbiased constituent in the process.

6. Build a budget and team (internal/external)

Everyone is probably now aware of how expensive e-discovery can be if managed improperly.  This makes it all that more imperative to work quickly to get a rough sense of the scope (which will lead to a budget) and the client’s willingness to absorb associated charges.  The most important step is to right-size the e-discovery effort with the risks inherent in the corresponding litigation/investigation.  Otherwise, there’s a high likelihood that e-discovery process will be over-engineered (too expensive) or under-scoped (cutting dangerous corners).

7. Figure out your risk profile

Similar to right-sizing the budget, it also makes sense to adopt a “horses for courses” approach to e-discovery since there is no singular way to handle a given matter.  For example, in one case you make take forensic images, restore backup tapes, capture instant messaging data, harness metadata, or decide to do an automated review with a with a “clawback” provision. In either case, the only mistake is to assume that an approach from another, dissimilar matter is warranted in the instant case.

8. Assume the opposition is better informed than you are

While this actually may not be the case, it’s a safer bet that assuming a level of naiveté that may not exist.  What is certain is that the Plaintiff’s bar is increasingly well informed and can be very aggressive.  They’ve seen the playbook that calls for baiting the opposition into a discovery misstep that can result in significant, case altering sanctions.  According to a recent survey, 63% of the polled attorneys said that e-discovery is being abused by counsel, so it’s important to be wary initially.

It’s also important to consider the potential reciprocity of a given matter and adjust your position accordingly.  In many instances it’s easy to consider your role only as a producing party, but with cross/counter claims it may be possible to simultaneously be propounding discovery and in the opposition’s shoes.

9. Prepare for an early case assessment

A recent industry survey found that effective early case assessment (ECA) approaches reduced overall litigation in half of the cases evaluated, and resulted in favorable outcomes for 76 percent of the cases.   The key to this methodology is to use the available next generation case analysis solutions earlier in the process, not just to review data for relevancy and privilege, but to:

  • Identify the key players. This is critical in order to have a defensible legal hold process
  • Evaluate the posture of the case to determine how it looks on the merits
  • Diagnose potential outliers in the e-discovery process to facilitate meet and confer discussions and help create “inaccessibility” arguments
  • Conduct a search term analysis for keyword negotiations during meet and confer discussions.  Objectively demonstrating the results of proposed search queries can go a long way in speeding up keyword negotiations

10. Don’t take search for granted

For many attorneys, e-discovery search is just like Lexis or Google.  Unfortunately, that isn’t the case.  Instead, it’s become highly complex and is now receiving significant judicial scrutiny.  In Victor Stanley v. Creative Pipe Judge Grimm suggested that attorneys need to rethink how they’ve traditionally managed the search process:  “[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”  It’s now important to devise (and share at early meet & confer conferences) a defensible search strategy that can withstand judicial scrutiny.

Why Transparent Search In E-Discovery Is The Answer To Victor Stanley

Tuesday, August 26th, 2008

In my last post, I discussed how the “black box” design of enterprise search engines makes it challenging to defensibly use keyword search in e-discovery and follow Judge Grimm’s guidance in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  In Victor Stanley, Judge Grimm notes that because keyword search technology is prone to producing over- and under-inclusive results, attorneys using keyword search should adopt one of two approaches: either collaborate with the opposing party to agree on keyword search methodology, or utilize best practices that demonstrate they have taken reasonable measures to reduce over- and under-inclusiveness.  However, the black box search technologies that are used in e-discovery today make following this guidance difficult.  They can’t reduce under-inclusiveness without increasing over-inclusiveness.  And they make it expensive to utilize collaborative or best practices methodologies including testing, sampling, refining and documenting searches.  All of which begs an obvious question: what can be done to improve search for e-discovery?

In my opinion, the answer is simple: e-discovery search needs to become more transparent.  Instead of being forced to feed one search query at a time into a “black box” search engine and then getting results  with no idea how those results were generated, lawyers and litigation support professionals need technology that provides them with greater visibility into the search process. They need to understand how the results were obtained, so they can reduce both the over- and under-inclusiveness of keyword search, and easily follow Judge Grimm’s advice to improve the defensibility of their search methodology.

A transparent search solution should have four key elements:

  1. Transparent query expansionQuery expansion is the process by which search engines take the query that the user submitted and expand or convert it into a new and improved form.  Wildcard, stemming, concept and fuzzy searches all follow this query expansion process.  For example, the search “divers*,” would be expanded to search for all the words that start with “divers” in the data set, such as “diverse,” “diversity,” “diversion,” “diversification,” etc.  In transparent search, query expansion would be exposed to users, allowing them to include or exclude expanded keywords. To continue with the previous example, a user that is searching for documents related to diversity would then have the ability to exclude false positive expanded terms, such as “divers”, “diversion,” and “diversification” from the search.  Making query expansion transparent can significantly reduce the over-inclusiveness of keyword search.  It also makes it practical to use technologies, such as concept and fuzzy search, that have not been used to date because of their complexity and tendency to produce massively over-inclusive results.
  2. Multiple query support. When a search contains multiple keyword queries, such as “hiring” and “interview,” transparent search should provide visibility into the results for each individual query as well as the combination of all the queries. For example, with the search “hiring OR interview,” users should have separate visibility into the results for “hiring” and “interview” as well as “hiring OR interview.”  They should know that out of the 100 documents that match “hiring OR interview”, only 5 match interview and 95 match hiring.  This kind of visibility is critical if you want to either collaborate or follow search testing, sampling, and refinement best practices when there are a large number of queries.
  3. Rapid sampling. Transparent search should support the ability to rapidly sample the results from all of the individual queries, such as “hiring” and “interview”, contained within a search. It should also be easy to take a random sample of non-matching documents in order to assess whether one or more searches have identified as many of the relevant documents as possible.  As Judge Grimm states in Victor Stanley when assessing keyword searches used to find privileged documents, “The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive.”
  4. Automated documentation. Transparent search technology needs to document all aspects of the search process including (but not limited to) any keyword that has been excluded during transparent query expansion, the combined results of a search containing multiple individual queries, and the results for each of the individual queries within that search.  Automatically documenting the search methodology used and the results obtained is critical so that users can “show their work” if their search methodology is ever called into question.

Benefits of Transparent Search

By addressing the main technology challenges of keyword search, transparent search provides significant benefits to attorneys and litigation support professionals using search for e-discovery. First, parties that adopt transparent search can improve the defensibility of their e-discovery search practices. By enabling iterative testing, sampling and refinement, transparent search allows users to adopt the approaches recommended by Judge Grimm when it was previously impractical to do so.  At the end of the day, this means less risk.

Second, the use of transparent search can substantially reduce downstream production and review costs by removing false positives. For example, it is not uncommon for certain wildcard searches to generate results where 20-40% of the included documents are false positives that can be removed by transparent query expansion.  This can result in thousands of dollars of savings on a single search query.

Finally, transparent search can dramatically reduce the time and cost required to complete the search and culling stage of e-discovery. Currently, it can take hundreds of hours to run a significant number of searches one at a time, document the results of each search, and sample and refine each individual query. With transparent search, running multiple queries and documenting each of the individual results takes minutes. Sampling each of the individual queries takes seconds.

When it comes to e-discovery search, it’s important to recognize that there are no “silver bullets.”  Search will remain an imperfect science with the possibility of over- and under-inclusive results.  But equally, there is no doubt that search remains the best solution for reducing the vast quantities of electronic information that are a part of every e-discovery process down to a reasonable level for human review. While attorneys and litigation support professionals can’t completely remove the imperfections of keyword search, they can, with transparent search, take action to minimize the impact of these imperfections and defensibly meet the requirements of new case law.  In doing so, they will be able to turn their attention to where it should be: the substance of the case.

Judge Grimm, Victor Stanley, And The Problem Of “Black-Box” E-Discovery Search

Friday, August 22nd, 2008

Judge Paul Grimm’s recent opinion in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008) provides valuable guidance on one of the most important issues in e-discovery: how to conduct keyword searches in a defensible manner given that keyword searches are prone to produce over- and under-inclusive results.  The ruling suggests one of two approaches: either producing parties should adopt a “collaborative” approach to conducting keyword searches, whereby each party agrees on a search methodology; or, they should use a “best practices” approach, such as the one suggested by Sedona, where the producing party tests, samples, and iteratively refines searches so that they can demonstrate they have taken reasonable measures to reduce over- and under-inclusive results.

While the guidance is clear, following the guidance in practice is very difficult.  The primary reason for this is that the search technology being used in e-discovery today is not up to the task.  Specifically, today’s search technology suffers from three problems:

  1. The over- and under-inclusive tradeoff. Many technologies have been developed to address the tendency of keyword searches to miss relevant documents and produce under-inclusive results.  Wildcard and stemming technology has been developed in order to address the issue of finding common word variations in specified keywords.  Concept search has been designed to find documents containing words with similar meanings to the keywords in a search.  And fuzzy search technologies have been put in place to find misspellings of words. However, all of these suffer from the same problem: they produce too many non-relevant or “false positive” documents thus driving up the cost of review. For example, if someone runs the wildcard search “divers*”, then he or she not only gets the desired documents containing “diverse” and “diversity”, but also gets a large number of false positive documents containing “diversion”, “diversification”, and so on.  In the case of concept and fuzzy search, the problem is so great that these technologies to date have rarely been used in e-discovery.
  2. Too expensive to test, sample and refine searches. Today’s search technologies are largely designed to run one search at a time, not the dozens of searches that are typical in e-discovery. As a result, anyone trying to follow the best practices of testing, sampling, and refining each search will find themselves missing deadlines and running over budget because it takes so long. This also makes collaboration with the opposing party close to impossible, since there’s little time to iterate on – and agree upon - a set of keyword searches.
  3. Manual documentation. It’s not enough for producing parties to use best practices, they have to document them so that they can “show their work” to the court. Currently, documenting the search refinement process is mostly manual, with the result that it is either done inadequately or not at all.

The reason why the search technology used for e-discovery has these problems is surprisingly simple: it’s because the technology was not designed for e-discovery in the first place. Rather, it was built for enterprise search, and was only later repurposed towards e-discovery.

The “Black Box” Of Enterprise Search

The core issue is that enterprise search technology has been designed to be a “black box”. Users enter a single search query into one end, and get results at the other, with no visibility into what happens in between. Going back to our previous example, when a user searches for “divers*” intending to find documents related to “diversity” or “diverse”, enterprise search engines give the user no visibility into the crucial step of query expansion and how it expands the search query into relevant and non-relevant terms like “diversion” and “diversification”. As a result, the user has no ability to minimize the false positives.

In the same vein, when a user enters multiple queries into a “black box” enterprise search engine, all of the queries run as a single search, and the user has no visibility into which results are associated with which query. For example, a user that searches for “hiring OR interview” will get the results for the combination of the queries “hiring” and “interview”. He or she won’t know that only 5 of documents contained “hiring” while 100 documents contained “interview.”  This limitation makes analyzing, sampling and refining searches costly and time consuming.

That’s not say that enterprise search products like Autonomy or Endeca are flawed. Far from it.  Their “black box” design works exceedingly well for the simple and quick queries that people want to run across the enterprise for general business purposes. If a sales manager is looking for a single proposal for her meeting the following day, then she doesn’t care how the search was performed or if it’s over-inclusive.  She’s only interested in the first page of relevant results, and for that use case enterprise search engines do a great job.

But e-discovery is a whole different world.  In e-discovery, users typically must review every single document in the search results, not just the most relevant ones.  As a result, over-inclusive searches can dramatically increase the costs of downstream production and review.  And under-inclusive searches raise the issue of defensibility.  Finally, e-discovery users have to run a lot of search queries and understand which documents are associated with each of those queries.

So, going back to the original problem, if current search technologies cannot help lawyers and litigation support professionals follow Judge Grimm’s guidance and address the “well-known limitations” of keyword search, what can? That will be the subject of my next post.