Archive for the ‘ediscovery’ Category

Demystifying Concept Search in Electronic Discovery

Tuesday, October 28th, 2008

Concept or content search continues to be a hot topic within the e-discovery community.  There’s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  Judge Facciola discusses concept search in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in e-discovery.   It’s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.

The “Concept Search is Concept Search” Myth

The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: concept search and concept categorization.  It’s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.

So, what are the differences between concept search and concept categorization?  First, let’s start with concept search.  Concept search technologies find documents containing “concepts”.  I think that the Sedona Conference’s “Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery“, provides a good definition of “concept” when used in a search context: “the combination of [a] query term and the additional terms identified by the thesaurus.”  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.

Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not “find” documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona commentary has a good summary of these different technologies if you are interested in learning more.

As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It’s why it’s not surprising that a lot of the users of e-discovery services and software don’t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I’ll discuss this in my next blog post.

How Will The Financial Crisis Impact E-Discovery?

Sunday, October 26th, 2008

A couple of weeks back, I attended a now-infamous meeting at Sequoia Capital, which has since been widely covered in the press and the blogosphere. For those unfamiliar with Sequoia, it is the world’s leading venture capital firm, with a string of early-stage investments in companies such as Apple, Cisco, and Google as well as, more recently, AdMob, Clearwell, and Loopt. The presentation says it more colorfully, but Sequoia’s point is simple: “We are at the beginning of a global economic slowdown that could last for years, and the cost of capital has sky-rocketed. In light of that, everyone needs to re-evaluate their growth plans and, if necessary, reduce expenses immediately.”

That message sent a chill through Silicon Valley. In the days that followed the meeting, several start-up companies announced layoffs, closely followed by larger companies like eBay and Yahoo, all citing economic conditions in the wake of the financial crisis. So naturally, the meeting and its aftermath got me thinking about what impact our current economic malaise will have upon the e-discovery industry.

If history is any guide, economic downturns lead to more litigation, and more litigation leads to more e-discovery. That’s why e-discovery has often proven to be a counter-cyclical business, and that certainly appears to be the case again now. While traditional technology companies like SAP and Seagate missed their numbers last quarter, the top e-discovery software companies posted strong results. And many lawyers are expecting even better times ahead, if last week’s ACC show or the recent Fulbright & Jaworski 2008 Litigation Trends Survey are any indicator. In particular, the survey results were quite striking, with more than one-third of companies surveyed predicting more lawsuits, and a quarter forecasting more regulatory inquiries. This makes sense in light of the fact that what we are facing is no “normal” recession; rather, it’s a downturn triggered by the sudden and widespread collapse of the banking sector which has left many people wanting legal redress for their grievances.

But, more important than any short-term increase in litigation, I think the real significance of the current crisis is that it will spur a sustained, long-term increase in demand for e-discovery solutions. As revenue growth slows, companies will focus on reducing costs to maintain profit growth. That will prompt many of them to examine the vast amounts of money being spent on e-discovery and accelerate the pace at which they use technology to cut costs by bringing elements of e-discovery in-house. Law firms and litigation support service providers will similarly find their invoices attract greater scrutiny. Their old ways of taking terabytes of data and dumping it into a linear review platform without first removing irrelevant or unresponsive data, will look increasingly profligate.

To learn more about how best to prepare for the coming wave of litigation, and associated increase in e-discovery, I strongly recommend next week’s webinar with Ron Best from Munger, Tolles, and Olson (MTO). Ron is a real innovator in this area, with extensive experience dealing with multi-party, complex litigation. He is also full of practical advice about how best to reign in e-discovery costs and manage with limited resources - skills that will be increasingly important in the coming months.

No industry is an island and, to some extent, we all get impacted by the same economic forces. But the unique thing about the e-discovery industry is that the worst of times can often be the best of times. Consider it a silver lining to the very large cloud hanging over our economy.

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance

E-Discovery In The Press

Thursday, October 2nd, 2008

Last month, for the first time, friends of mine who do NOT work in the legal industry starting talking to me about e-discovery. In the past, they had always taken on the glazed look of a bored 8th-grader whenever I spoke about what I do. But suddenly, they were strangely interested and full of questions.

The reason was two articles about e-discovery in the mainstream media which appeared within a week of each other. The first was in the Wall Street Journal, which wrote about how tech firms are at war with lawyers. According to the Journal, the fact that companies are saving money by using e-discovery software is bad news for lawyers, since they are “facing the loss of lucrative client fees.” In response, the lawyers are fighting back: “The attorneys counter that there are pitfalls to replacing them. Early this year, a federal judge required chip maker Qualcomm to pay rival Broadcom more than $8 million after it failed to uncover and share emails relevant to a case.”

I am sure there are lawyers who see technology as a threat, but the firms I deal with are actively embracing e-discovery technology, not fighting it. They see it as another way they can add value to their clients, and would prefer to have their staff focused on practicing law, not mindlessly reading irrelevant documents. So I ended up spending a lot of time explaining to my non-legal friends that there are two sides to the coin. As for my friends who do happen to be lawyers, they focused on the Qualcomm case, pointing out (as we have written before) that the problem was not technology, but rather poor processes and bad judgment on the part of the attorneys concerned.

The second article appeared in the Economist and took a different tack. It argued that the stratospheric cost of e-discovery is gumming up the court system and preventing justice from being served. According to one former justice from Colorado quoted in the article, even mundane landlord-tenant disputes “are now digital wars of attrition”; there are “cases that are settled only because one party cannot afford the costs of e-discovery”; and, many “plaintiffs cannot afford to sue at all, for fear of the e-discovery costs.”

I love the Economist’s tongue-in-cheek style and thought the article made many valid points. My one disappointment was that its spin was unequivocally negative, as though e-discovery is a self-inflicted wound on the American judicial system. Nowhere was there mention of the fact that electronic evidence often helps litigants get at the truth. Rather than incomplete recollections or “he said-she said” claims and counter-claims, there’s no disputing an email that captures a person’s words and actions in black-and-white. Nor was there any mention of how technology is solving the problems that it inadvertently created: today, there are many products that rapidly sift through electronic information, dramatically lowering the cost of e-discovery.

It is great for everyone in the e-discovery community for our domain to get more ink in mainstream, quality publications. I expect that the trend will continue as the industry grows, and especially once the investigations start into our current financial meltdown.

“Aggressive Culling”: The E-Discovery Buzz Cut

Tuesday, September 30th, 2008

Ralph Losey, never one to mince words, recently analyzed a recent litigation survey from the elite Fellows of the American College of Trial Lawyers. The survey highlights the fact that one of the main problems facing the U.S. legal system today is (surprise!) e-discovery. Also (not) a surprise is that the study “places the blame squarely on poor rules, bad law, and judges”, while overlooking the role that lawyers play in the problem.

In his analysis, Ralph makes a number of insightful observations that should help lawyers move from being e-discovery troublemakers to being part of the solution. However, one of his key critiques is targeted not at lawyers but rather at the vendor community: “[E-discovery] is too expensive because lawyers and judges do not know what they are doing, and do not know how to properly cull and review email, and because clients are disorganized pack-rats. Many of the e-discovery vendors are also misinformed, but often they do know better; they just have no pecuniary interest in aggressive culling. Some may even seek to line their own pockets in inflated discoveries.”

As Ralph bluntly points out, pecuniary interests (translation: money) plays a big role here, but so does risk reduction. Imagine you’re given the opportunity to process a 2 terabyte case all the way through to review. With the “funnel” of e-discovery costs placing the highest dollar per gigabyte value on the end of the process (i.e. review), what’s your incentive to cull aggressively at the beginning? Not much from a revenue perspective, certainly, but also not much from a risk perspective: particularly when you have sanctions and lawsuits on your mind and are thinking about the potential liability that you incur by excluding potentially relevant documents by using too broad a brush (or pair of garden clippers) in your pruning.

How do we move forward? As document volumes continue to grow, it’s clear that aggressive culling (with a few caveats which we’ll get to in a minute) is a critical tool for managing costs and improving case outcomes (let’s go out on a limb and define “improving” as producing fairer and more equitable rulings). However, in order to adopt more aggressive culling as a standard part of the electronic discovery process, the community has to come to terms with three things:

  • The Myth of Perfection: There may be perfect abs, but there is no perfect e-discovery. Organizations like the E-Discovery Institute are doing fantastic work to measure and improve the accuracy of electronic discovery efforts, but in the end it’s tough to make the argument that having 100 contract attorneys manually reviewing 10 million documents will necessarily produce a better overall e-discovery outcome than  10 specialized attorneys reviewing 200,000 documents that were aggressively (but thoughtfully) culled from initial 10 million document set. There simply is no black and white set of rules that will lead to a perfect process.
  • The Benefit of Cost Control: Given that, it is in the best interest of everyone involved (yes, even vendors) to choose the most cost-effective process that provides a high likelihood of producing the information relevant to the case.  This means “saving your bullets” by not spending all of your e-discovery dollars up front in a case pursing the perfection myth, but instead approaching discovery in an incremental fashion which can adapt to changing facts and circumstances as the matter unfolds. How, you may ask, do vendors benefit? They can become more strategic e-discovery advisors by working with counsel over the full lifecycle of a case, providing higher-value (and, by the way, more interesting and intellectually challenging) consulting services to help incrementally adjust and adapt the course of e-discovery. As Ralph puts it: “…Trial lawyers should accept that specialists in the field of e-discovery are a necessary evil. If an e-discovery specialist knows the field, they can save you money and take you out of the e-discovery morass faster and more reliably than a dozen new rules. The world today is too complex for one man or woman to do it all.”
  • The Value of Defensibility: Many of you likely winced at the term “high likelihood” in the previous point. “Sacrilege!” you cried. “I demand certainty!” First, go back and re-read the first point about the Myth of Perfection. Then, consider that a better way forward may be an approach to e-discovery that involves more aggressive culling early in the process to focus on the most important documents first, more iterations to adapt to changing facts and circumstances, and, all along the way, a complete audit trail that provides defensibility in the event that any aspect of the process is ever questioned. Such defensibility would include specific documentation about the culling decisions that were made, down to the keyword and “sub-keyword” (i.e. wildcard expansion) level, so all the cards are on the table for everyone to see.  The value of defensibility when performing aggressive culling is enormous, in that it adds an additional measure of safety and trust to the process, minimizing the amount of doubt and second-guessing that so often plagues e-discovery negotiations.

By coming to terms with the fundamental imperfections of the e-discovery process and embracing the promise of lower costs and the agility and responsiveness that can be gained with a more iterative approach, everyone stands to gain from the safe and controlled adoption of aggressive culling – yes, even the vendors (at least the smart ones) and their ever-present pecuniary interests.