Posts Tagged ‘Sedona’

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.

A Gross Inability to Craft Electronic Discovery Searches

Thursday, April 9th, 2009

The bashing of our judicial system seems to have reached a fevered pitch.  Groups like the American College of Trial Lawyers (“ACTL”) have proclaimed in a recent report that while the “civil justice system is not broken, it is in serious need of repair.”  The blame game seems to have judges and attorneys alike pointing fingers.  The Fellows of the ACTL (perhaps not surprisingly) seems to pin some of the blame on the judiciary:

“Judges should have a more active role at the beginning of a case in designing the scope of discovery and the direction and timing of the case all the way to trial. Where abuses occur, judges are perceived not to enforce the rules effectively.”

Groups like the Sedona Conference chalk up many of the ills to the failure to cooperate, so much so that they’ve orchestrated a cooperation proclamation – which has picked up enough support by the bench to have garnered several cites in the case law (see e.g., Mancia).

The bench for its part seems to put some of the onus on litigators and their reticence to get with the times.  William A. Gross. Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 2009 WL 724954 (S.D.N.Y. Mar. 19, 2009) is the latest example of such a proclamation.  In this construction defect case, Judge Peck (a Sedona devotee) issues what he hopes will be a “wake-up” call to the bar about the need for “careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information (‘ESI’).”  In Gross, the court had to mediate an e-discovery dispute where the requesting party propounded a blatantly over-inclusive search request crafted by the requesting parties.  Unfortunately, the responding entity was a non-party and they simply dig their heads in the sand.  In order to facilitate a resolution this left the Court in the “uncomfortable position” of having to craft a “keyword search methodology for the parties, without adequate information from the parties (and Hill).”

Judge Peck’s exasperation with these antics was palpable.  Summing up the problem by citing Judge Grimm and Victor Stanley he stated: “This case is just the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails.”  He further noted: “[w]hile this message has appeared in several cases from outside this Circuit, it appears that the message has not reached many members of our Bar.”

After noting both Sedona and Judge Facciola (of O’Keefe and Equity Analytics fame) Peck’s opinion reached a crescendo:

“Electronic discovery requires cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI. Moreover, where counsel are using keyword searches for retrieval of ESI, they at a minimum must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives.’ It is time that the Bar-even those lawyers who did not come of age in the computer era-understand this.”

While it’s easy to see who Peck blames in this brouhaha, it takes (at least) two to tango.  Meaning that litigants on both sides of the “v” must move beyond the typical “seat of the pants” electronic discovery wrangling.  And, judges need to be savvy enough to spot the issues to help/force the parties into such an enlightened/cooperative state.  Nothing short will get the job done.

Task Force Finds Electronic Discovery Process in Need of “Serious Overhaul”

Friday, March 27th, 2009

The American College of Trial Lawyers Task Force on Discovery (“Task Force”) recently came out with their final report based on their survey of the Fellows of the American College of Trial Lawyers (“ACTL”).  The project was conceived as an “outgrowth of increasing concerns that problems in the civil justice system, especially those relating to discovery, have resulted in unacceptable delays and prohibitive expense.”  After releasing an interim report, the Task Force issued its final say on the topic, which honed in on three major themes borne out by the Survey:

1. Although the civil justice system is not broken, it is in serious need of repair. In many jurisdictions, today’s system takes too long and costs too much. Some deserving cases are not brought because the cost of pursuing them fails a rational cost-benefit test while some other cases of questionable merit and smaller cases are settled rather than tried because it costs too much to litigate them.

2. The existing rules structure does not always lead to early identification of the contested issues to be litigated, which often leads to a lack of focus in discovery. As a result, discovery can cost far too much and can become an end in itself. As one respondent noted: “The discovery rules in particular are impractical in that they promote full discovery as a value above almost everything else.” Electronic discovery, in particular, needs a serious overhaul.

3. Judges should have a more active role at the beginning of a case in designing the scope of discovery and the direction and timing of the case all the way to trial. Where abuses occur, judges are perceived not to enforce the rules effectively. According to one Fellow, “Judges need to actively manage each case from the outset to contain costs; nothing else will work.”

In short, the Survey revealed widely-held opinions that there are serious problems in the civil justice system and that the discovery process, though not broken, is “badly in need of attention.”  While not cited specifically, a recent case highlights many of the Survey’s observations.  In Fannie Mae Sec. Litig., 552 F.3d 814 (D.C. Cir. 2009) the Office of Federal Housing Enterprise Oversight (OFHEO) responded to a third party subpoena and in the process incurred $6M in electronic discovery expenses.  While this case had a number of procedural nuances that fortunately make its holding fairly limited to the facts, this electonic discovery fiasco certainly is a poster child for a discovery process that is bursting at the seams.

The $6M problem started for the OFHEO when the individual defendants became skeptical of a limited production and obtained a Rule 30(b)(6) deposition, which confirmed that OFHEO had failed to search all of its off-site disaster-recovery backup tapes.  This inquiry led the OFHEO to enter into a stipulated order to avoid further contempt hearings.  As part of the stipulated order, the individual defendants submitted over 400 search terms, which covered over 600,000 documents.  Overwhelmed with the burden of conducting such a search and the need to hire 50 contract attorneys, the OFHEO objected that the list of search terms was “tantamount to a request for the dictionary,” since it resulted in a “retrieval of approximately 80 percent of the office’s emails.”  Unfortunately, the court ultimately held that the OFHEO needed to comply with the terms of the stipulated order even though the cost was a staggering “9 percent of the agency’s entire annual budget.” To add insult to injury, and despite their efforts, the OFHEO was found in contempt and sanctioned for not meeting the agreed upon discovery deadlines.

This $6M example brings us back to the Survey and the findings of the Task Force.  They proposed a set of Principles (modeling and citing the Sedona Working Group) that would “shape solutions to the problems they have identified.”  Several relating to e-discovery stand out…

  • Promptly after litigation is commenced, the parties should discuss the preservation of electronic documents and attempt to reach agreement about preservation. The parties should discuss the manner in which electronic documents are stored and preserved. If the parties cannot agree, the court should make an order governing electronic discovery as soon as possible. That order should specify which electronic information should be preserved and should address the scope of allowable proportional electronic discovery and the allocation of its cost among the parties.
  • Electronic discovery should be limited by proportionality, taking into account the nature and scope of the case, relevance, importance to the court’s adjudication, expense and burdens.
  • The obligation to preserve electronically-stored information requires reasonable and good faith efforts to retain information that may be relevant to pending or threatened litigation; however, it is unreasonable to expect parties to take every conceivable step to preserve all potentially relevant electronically stored information.
  • Absent a showing of need and relevance, a party should not be required to restore deleted or residual electronically-stored information, including backup tapes.
  • Sanctions should be imposed for failure to make electronic discovery only upon a showing of intent to destroy evidence or recklessness.
  • The cost of preserving, collecting and reviewing electronically-stored material should generally be borne by the party producing it but courts should not hesitate to arrive at a different allocation of expenses in appropriate cases.
  • In order to contain the expense of electronic discovery and to carry out the Principle of Proportionality, judges should have access to, and attorneys practicing civil litigation should be encouraged to attend, technical workshops where they can obtain a full understanding of the complexity of the electronic storage and retrieval of documents.

As Oscar Goldman said about Steve Austin, the legendary $6M man“Gentlemen, we can rebuild him. We have the technology…” The electronic discovery “quagmire” appears to need the same type of radical makeover.  Data is proliferating at a rate far greater than the e-discovery competency of litigators and judges alike.  Tools are out there that can help tackle the proliferation problem, but the need for, and ultimate use of, such tools must be appreciated by counsel on both sides of the “v.”  Until notions of proportionality and cooperation start becoming common parlance for both litigators and judges we will unfortunately continue to see more $6M examples like Fannie Mae.

The Electronic Discovery Sheriff Is Back In Town

Thursday, January 29th, 2009

As Tiger Woods is to golf, the honorable Shira A. Scheindlin is to electronic discovery.  She has unquestionably been the most dominant/visible/outspoken jurist in the electronic discovery realm over the past decade, penning amongst others, the Zubulake opinion, which is commonly referred to as the gold standard in electronic discovery.

But, like Woods, who recently took a sabbatical to mend his surgically repaired knee, Judge Scheindlin has recently been eclipsed by several other notable electronic discovery jurists, namely Judge Grimm (of Victor Stanley and Mancia fame) and Judge Facciola (aka “the Italian Stallion“) both of whom made numerous “best of the year” electronic discovery case law lists.

With Securities and Exchange Commission v. Collins & Aikman Corp., 2009 WL 94311 (S.D.N.Y., Jan. 13, 2009) Judge Scheindlin serves notice that the sheriff is back in town.  She not only tackles a number of thorny electronic discovery topics, but ambitiously takes on the US government in the process.  It’s fairly lengthy opinion, well worth the read, so I’ll just excerpt out a few of the notable takeaways.

As a bit of background…  the Collins case centered around a securities fraud complaint brought by the SEC against the Collins & Aikman Corp. and its former CEO David A. Stockman.  The crux of the dispute surrounded questions concerning the government’s discovery obligations in civil discovery (versus in a purely SEC investigation per se).

There were four distinct but interrelated disputes, namely:

“(1) Whether identifying responsive documents that have been organized by the producing party invades the protection accorded to attorney work-product and how a government agency-acting in its investigative capacity-must respond to a request for the production of documents. (2) Whether a government agency may unilaterally restrict the scope of its search based on an assertion of an “undue burden” on limited public resources. (3) How much information the Government must disclose in order to allow an adversary-and the court-to assess an objection based on the deliberative process privilege. (4) Whether a government agency may unilaterally exclude its own e-mail from document production on the ground that most-but not all-will be privileged.”

Addressing the work product claims, the court found against the government, again reinforcing several recent opinions about electronic discovery search:

“The SEC contends that Stockman can search through the ten million pages and find substantially the same documents identified by the SEC without impinging on the thought processes of the SEC attorneys. Indeed-at significant expense and delay-Stockman could search the document databases using appropriate search terms, but the inaccuracy of such searches is by now relatively well known.  A page-by-page manual review of ten million pages of records is strikingly expensive in both monetary and human terms and constitutes “undue hardship” by any definition.” [Citing, George L. Paul and Jason R. Baron's article: Information Inflation: Can the Legal System Adapt?

After losing the first battle, the SEC argued that even if the compilations were not protected as work product, it could produce the "complete, unfiltered, and unorganized investigatory file" since this was how the documents were "maintained in the usual course of its business."  This second attempt was similarly unpersuasive as Judge Scheindlin held that the "usual course of business" exemption did not apply:

"[C]onducting an investigation-which is by its very nature not routine or repetitive-cannot fall within the scope of the “usual course of business.” While the SEC routinely collects and maintains regulatory submissions such 10-K reports, in its investigative capacity the agency conducts tailored probes of a company or an industry, requiring the gathering of records from diverse sources. Many if not most of the 1.7 million documents in the SEC production here were likely collected in the agency’s investigatory role. Thus it is no surprise that the complete collection is maintained as it was collected-in large disorderly databases. The documents can only be provided in a useful manner if the agency organizes or labels them to correspond to each demand.”

Next, Judge Scheindlin addressed the SEC’s decision to “unilaterally” limit its search to “centralized compilations” which ultimately “turned up nothing.”  She found that the SEC’s “blanket refusal to negotiate a workable search protocol” was “patently unreasonable” citing both Mancia and the Sedona Conference’s Cooperation Proclamation:

“Rule 26(f) requires the parties to hold a conference and prepare a discovery plan. … Had this been accomplished, the Court might not now be required to intervene in this particular dispute. I also draw the parties’ attention to the recently issued Sedona Conference Cooperation Proclamation, which urges parties to work in a cooperative rather than an adversarial manner to resolve discovery issues in order to stem the ‘rising monetary costs’ of discovery disputes.”

As the coup de gras, Judge Scheindlin addressed and rejected out of hand the SEC’s most untenable claim that it would not produce e-mail “generated or received by the Commission itself” because “nearly all responsive e-mails will be privileged, protected, or non-substantive.”

“Because e-mails are inherently searchable, the SEC’s blanket refusal to produce any in-coming or outgoing e-mails is unacceptable. Without even an attempt to negotiate search terms that would weed out privileged, protected, or irrelevant e-mails, the SEC cannot reasonably assert that a routine aspect of modern discovery-search and review of a party’s e-mail-is beyond its capability. Essentially, the SEC’s position is that the cost of such a search is simply too high, but it has made no effort to document the cost or the likelihood that it would produce relevant, nonprivileged material. The concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.”

At the end of the day, the Collins opinion seems to make statement the Judge Scheindlin is back with a vengeance and she’s serving notice that the government isn’t above the law:

“Like any ordinary litigant, the Government must abide by the Federal Rules of Civil Procedure.”

Besides knocking the government down a peg, Judge Scheindlin throws her judicial weight behind a number of important but nascent trends, including the Sedona Cooperation Proclamation, the related need to meet & confer, the use of sampling and the challenges of electronic discovery search. While none of these notions are groundbreaking, her substantial backing means increasing clarity for lawyers and litigation support practitioners everywhere.  And, that’s certainly welcome.

Top 5 Cases That Shaped Electronic Discovery in 2008

Friday, December 12th, 2008

Picking five out of the sea of electronic discovery cases isn’t as easy as it sounds.  Sure, a few, like our “Case of the Year” will be no-brainers, but others aren’t as clear cut.  And, they’re certainly open to debate.  But, in my humble opinion here’s THE list, counting down David Letterman style:

5) Mancia v. Mayflower Textile Servs. Co., 2008 WL 4595175 (D. Md. Oct. 15, 2008)

If there ever was an opinion written by a judge to make a larger societal point, Mancia was certainly it.  Judge Paul Grimm, who’ll appear on this list in another slot as well, has clearly taken the mantle from Judge Scheindlin as the leading electronic discovery jurist.  He’d heretofore authored a number of significant opinions in this area, including Hobson and Thompson. Now, in Mancia he used a garden variety discovery dispute, which was typically rife with boilerplate objections and other obstreperous tactics, to highlight the Sedona Conference’s Cooperation Proclamation.

The lasting takeaway from the opinion is the notion that “[c]ourts repeatedly have noted the need for attorneys to work cooperatively to conduct electronic discovery, and sanctioned lawyers and parties for failing to do so.” To support this notion he cites the Sedona Conference Proclamation and the little used FRCP 26(g).  This opinion is noteworthy because it gives precedent to bolster the Sedona initiative and should provide a ready citation for all those counsel who aren’t getting the level of cooperation they need from the opposition.  It remains to be seen if other judges will follow suit, but this could be the beachhead for a more cooperative electronic discovery process in 2009 and beyond.

4) Flagg v. City of Detroit, 252 F.R.D. 346 (E.D. Mich. 2008)

Flagg highlights the growing need to reconcile the electronic discovery landscape, which typically focuses somewhat myopically on email, with the larger informational trends which are now categorized by the use of blogs, social networking sites, instant messaging, and text messaging.  Flagg was one of the first to determine text messages (e.g., messages exchanged among certain officials and employees of the City of Detroit via city-issued text messaging devices) were discoverable under the standards of FRCP 26(b)(1).  The holding further demonstrated the challenges of conducting electronic discovery across information systems that mix personal information with business communications.  This type of information commingling will continue to escalate, causing significant long term electronic discovery challenges due to thorny privacy, privilege and policy implications.

3) Rhoads Indus., Inc. v. Bldg. Materials Corp. of Am., 2008 WL 4916026 (E.D. Pa. Nov. 14, 2008)

Rhoads is one of the first cases post Federal Rule of Evidence (FRE) 502, which recently created a national standard (versus the previous split in jurisdictions) and now states a “middle ground” for the determining of inadvertent disclosure during electronic discovery.  The key provision is (b)(2) which provides protection only if “the holder of the privilege or protection took reasonable steps to prevent disclosure.”  So, Rhoads took that “reasonableness” question head on in a scenario where the plaintiff Rhoads admittedly (yet inadvertently) produced over eight hundred privileged, electronic documents.  The decision is significant because it used the five-factor test stated in Fidelity, but put an undue weighting on the final test which was: “whether the overriding interests of justice would be served by relieving the party of its errors.”   This approach potentially threatens the development of sound case law that will be necessary to help the deployment of FRE 502 into practice because it casts too much uncertainty with its weighting of “fairness” (a problematically vague notion) in the analysis.  It will be interesting to see if/how this approach is subsequently adopted as we enter the New Year.

2) Qualcomm Inc. v. Broadcom Corp., 2008 WL 66932 (S.D. Cal. Jan. 7, 2008)

This for many was the case of the year given it’s far reaching implications for the legal community.  Some have argued that this isn’t an e-discovery abuse case per se, but more of an example of discovery abuses that just so happened to be centered around ESI.  In either case, the fraud, resulting cover-up, sanctions, ethical issues and privilege discussions made for insightful and thought provoking reading throughout 2008.  The lasting takeaway from Qualcomm appears to be the implications of not just committing discovery abuses, but the failure of having a well thought out e-discovery plan that is actively executed/monitored by outside counsel.  The resulting tension between outside counsel, inside counsel and the internal IT department may continue to escalate if more cases like this make the headlines in 2009.

1)  E-Discovery Case of the Year: Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008)

Judge Grimm’s hallmark opinion has had the legal community buzzing over the past several months and the reason appears pretty straight forward.  In Victor Stanley Grimm builds on the holdings in Seroquel, O’Keefe and Equity Analytics, to boldly cast doubt on a practice so routine that it’s literally shocked the legal community into reevaluation:

(“[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) . . . .”

The notion that electronic discovery search is beyond the ability of most attorneys has caused tremors within the litigation support community who had a long history of blindly receiving keywords from counsel, running them and turning back over the results – often blissfully unaware of the extent to which those keyword searches actually located relevant information.  Victor Stanley‘s analysis of the “reasonableness” of search protocols also has impact on the FRE 502 and therefore cements its place alongside other e-discovery “must reads” such as Zubulake and Morgan Stanley.

The cases above are my Top 5.  What additional cases do you think were important?  Please let me know by commenting on the cases you think shaped electronic discovery in 2008 and why.

Learn More On: Frcp Electronic discovery.

The Sedona Cooperation Proclamation and the Case for Collaboration

Monday, November 17th, 2008

Without getting in Dutch with the key Sedona Conference principle that “what happens at Sedona, stays at Sedona” I thought I’d nevertheless write a post that focuses on the core topic at this year’s annual meeting, namely the case for cooperation in e-discovery.

According to the “Cooperation Proclamation” e-discovery is facing an unprecedented crisis:

“The costs associated with adversarial conduct in pre-trial discovery have become a serious burden to the American judicial system. This burden rises significantly in discovery of electronically stored information (“ESI”). In addition to rising monetary costs, courts have seen escalating motion practice, overreaching, obstruction, and extensive, but unproductive discovery disputes – in some cases precluding adjudication on the merits altogether – when parties treat the discovery process in an adversarial manner. Neither law nor logic compels these outcomes. With this Proclamation, The Sedona Conference launches a national drive to promote open and forthright information sharing, dialogue (internal and external), training, and the development of practical tools to facilitate cooperative, collaborative, transparent discovery.”

These sentiments about the “broken” nature of the discovery process echo in many ways the draft findings from the Interim Report & 2008 Litigation Survey from the Fellows of the American College of Trial Lawyers which stated:

“The joint study grew out of a concern that discovery is increasingly expensive and that the expense and burden of discovery are having substantial adverse effects on the civil justice system. There is a serious concern that the costs and burdens of discovery are driving litigation away from the court system and forcing settlements based on the costs, as opposed to the merits, of cases.”

In both instances, the core notion is that “we’ve met the enemy and the enemy is us” because it’s the participants in the process have collectively perverted the discovery process to the point it’s at today.

Sedona’s focus on this front has received at least some traction from the bench, as echoed in Mancia v. Mayflower Textile Servs. Co., 2008 WL 4595175 (D. Md. Oct. 15, 2008).  Mancia, written by leading e-discovery jurist Judge Grimm, was a fairly pedestrian employment litigation case where the parties had come to loggerheads over the e-discovery process.  Judge Grimm held that “[c]ourts repeatedly have noted the need for attorneys to work cooperatively to conduct discovery, and sanctioned lawyers and parties for failing to do so” citing both the Sedona Cooperation Proclamation and the Survey.

Judge Grimm also observed that the these recent lamentations about the costs of civil litigation aren’t terribly dissimilar to those voiced eighteen years ago when the Civil Justice Reform Act of 1990, 28 U.S.C. §§ 471 et seq., was passed:

“Perhaps the greatest driving force in litigation today is discovery. Discovery abuse is a principal cause of high litigation transaction costs. Indeed, in far too many cases, economics-and not the merits-govern discovery decisions. Litigants of moderate means are often deterred through discovery from vindicating claims or defenses, and the litigation process all too often becomes a war of attrition for all parties.”

Given the fundamentally adversarial nature of litigation, the Sedona initiative is either dramatically ambitious or simply tilting at windmills.  While generally a skeptic by nature, I think that the bench’s early participation and downstream behavior modification is the linchpin to reforming the litigating masses.  Given the long term “sales” cycle involved here, I doubt if we’ll know whether this effort will gain real traction for at least several years.

Demystifying Concept Search in Electronic Discovery

Tuesday, October 28th, 2008

Concept or content search continues to be a hot topic within the e-discovery community.  There’s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  Judge Facciola discusses concept search in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in e-discovery.   It’s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.

The “Concept Search is Concept Search” Myth

The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: concept search and concept categorization.  It’s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.

So, what are the differences between concept search and concept categorization?  First, let’s start with concept search.  Concept search technologies find documents containing “concepts”.  I think that the Sedona Conference’s “Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery“, provides a good definition of “concept” when used in a search context: “the combination of [a] query term and the additional terms identified by the thesaurus.”  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.

Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not “find” documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona commentary has a good summary of these different technologies if you are interested in learning more.

As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It’s why it’s not surprising that a lot of the users of e-discovery services and software don’t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I’ll discuss this in my next blog post.

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance

Judge Grimm, Victor Stanley, And The Problem Of “Black-Box” E-Discovery Search

Friday, August 22nd, 2008

Judge Paul Grimm’s recent opinion in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008) provides valuable guidance on one of the most important issues in e-discovery: how to conduct keyword searches in a defensible manner given that keyword searches are prone to produce over- and under-inclusive results.  The ruling suggests one of two approaches: either producing parties should adopt a “collaborative” approach to conducting keyword searches, whereby each party agrees on a search methodology; or, they should use a “best practices” approach, such as the one suggested by Sedona, where the producing party tests, samples, and iteratively refines searches so that they can demonstrate they have taken reasonable measures to reduce over- and under-inclusive results.

While the guidance is clear, following the guidance in practice is very difficult.  The primary reason for this is that the search technology being used in e-discovery today is not up to the task.  Specifically, today’s search technology suffers from three problems:

  1. The over- and under-inclusive tradeoff. Many technologies have been developed to address the tendency of keyword searches to miss relevant documents and produce under-inclusive results.  Wildcard and stemming technology has been developed in order to address the issue of finding common word variations in specified keywords.  Concept search has been designed to find documents containing words with similar meanings to the keywords in a search.  And fuzzy search technologies have been put in place to find misspellings of words. However, all of these suffer from the same problem: they produce too many non-relevant or “false positive” documents thus driving up the cost of review. For example, if someone runs the wildcard search “divers*”, then he or she not only gets the desired documents containing “diverse” and “diversity”, but also gets a large number of false positive documents containing “diversion”, “diversification”, and so on.  In the case of concept and fuzzy search, the problem is so great that these technologies to date have rarely been used in e-discovery.
  2. Too expensive to test, sample and refine searches. Today’s search technologies are largely designed to run one search at a time, not the dozens of searches that are typical in e-discovery. As a result, anyone trying to follow the best practices of testing, sampling, and refining each search will find themselves missing deadlines and running over budget because it takes so long. This also makes collaboration with the opposing party close to impossible, since there’s little time to iterate on – and agree upon – a set of keyword searches.
  3. Manual documentation. It’s not enough for producing parties to use best practices, they have to document them so that they can “show their work” to the court. Currently, documenting the search refinement process is mostly manual, with the result that it is either done inadequately or not at all.

The reason why the search technology used for e-discovery has these problems is surprisingly simple: it’s because the technology was not designed for e-discovery in the first place. Rather, it was built for enterprise search, and was only later repurposed towards e-discovery.

The “Black Box” Of Enterprise Search

The core issue is that enterprise search technology has been designed to be a “black box”. Users enter a single search query into one end, and get results at the other, with no visibility into what happens in between. Going back to our previous example, when a user searches for “divers*” intending to find documents related to “diversity” or “diverse”, enterprise search engines give the user no visibility into the crucial step of query expansion and how it expands the search query into relevant and non-relevant terms like “diversion” and “diversification”. As a result, the user has no ability to minimize the false positives.

In the same vein, when a user enters multiple queries into a “black box” enterprise search engine, all of the queries run as a single search, and the user has no visibility into which results are associated with which query. For example, a user that searches for “hiring OR interview” will get the results for the combination of the queries “hiring” and “interview”. He or she won’t know that only 5 of documents contained “hiring” while 100 documents contained “interview.”  This limitation makes analyzing, sampling and refining searches costly and time consuming.

That’s not say that enterprise search products like Autonomy or Endeca are flawed. Far from it.  Their “black box” design works exceedingly well for the simple and quick queries that people want to run across the enterprise for general business purposes. If a sales manager is looking for a single proposal for her meeting the following day, then she doesn’t care how the search was performed or if it’s over-inclusive.  She’s only interested in the first page of relevant results, and for that use case enterprise search engines do a great job.

But e-discovery is a whole different world.  In e-discovery, users typically must review every single document in the search results, not just the most relevant ones.  As a result, over-inclusive searches can dramatically increase the costs of downstream production and review.  And under-inclusive searches raise the issue of defensibility.  Finally, e-discovery users have to run a lot of search queries and understand which documents are associated with each of those queries.

So, going back to the original problem, if current search technologies cannot help lawyers and litigation support professionals follow Judge Grimm’s guidance and address the “well-known limitations” of keyword search, what can? That will be the subject of my next post.

Read more about Legal discovery.

E-Discovery Advice: “No Ask-y, No Get-y”

Monday, April 21st, 2008

8-ball3.jpgIn a time before e-discovery, I toiled away alongside a partner at Chapin, Fleming and Winet – Larry Shea. While not reducing his legal sagacity to one pithy catch phrase, his “no ask-y, no get-y” line is nevertheless a truism I often ponder.(i)

As a green associate, fresh out of law school, I had a number of idealistic (read: naïve) assumptions about how litigators wrangled over discovery disputes. One day, while dealing with a particularly thorny electronic discovery problem, I came to Larry and told him what I thought we wanted and why we needed it in a specific format. I knew that the opposition wasn’t likely to grant our e-discovery request, partially because they’d surely intuit how badly we needed it. Larry simply responded with his truism and explained that if we didn’t express our wishes we’d (a) likely not get what we wanted and (b) would not have established our position if push came to shove with the judge.

Well, I just read a recent case (Autotech Techs. Ltd. P’ship v. Automationdirect.com, Inc., 2008 WL 902957 (N.D. Ill. Apr. 2, 2008)) and it showed me that no matter how evolved the legal discovery process has become, the basic “no ask-y, no get-y” notion still applies.

In Autotech, the issue surrounded the production of electronically stored information (ESI) per Fed.R.Civ.P. 34(b)(2)(E) which basically says that court documents must be produced as they are kept in the “usual course of business” or in a “reasonably usable form.” Significantly, section (iii) also states that a party need not produce the same ESI in more than one form.

Unfortunately, the requesting party (ADC) didn’t specify a form for the production of the document at issue, so “Autotech had the option of producing it in the form in which it was ordinarily maintained, or in a reasonably usable form.” Similarly, ADC did not specify that it wanted metadata as a part of the responsive document production. The court was not sympathetic to ADC’s requests: “It seems a little late to ask for metadata after documents responsive to a request have been produced in both paper and electronic format.” The court ultimately found that “ADC was the master of its production requests; it must be satisfied with what it asked for.”

In other words, “no ask-y, no get-y.”

Yes, this all seems so simple, but parties still are routinely stepping in this same pothole. Useful e-discovery best practices to avoid this predicament follow along these lines:

  1. Determine what format of ESI production you’re going to require. This sometimes isn’t as easy as it sounds since there are a number of permutations of review environments, even for common platforms such as Concordance [s1]and Summation Work backwards with the attorney review team and their litigation support personnel to figure out what you’ll need and the type of “load files” that are required.
  2. Determine if you’ll likely want metadata. In lieu of any specific guidance, it’s fair to assume you’ll want metadata for spreadsheets (to calculate formulas), in cases involving computer forensics and for matters involving granular document authenticity/chain of custody, to name a popular few. The challenge is that you may not know about some of these issues at the time of the early Meet and Confer conferences. This is particularly important since there is a “modest legal presumption in most cases that the producing party need not take special efforts to preserve or produce metadata.” Williams, 230 F.R.D. at 651 (quoting The Sedona Principles, Comment 12a). So, the opposition may be on pretty solid footing if they claim that they had no duty to keep the metadata if you don’t make your needs known early on.
  3. Ask for what you want. Here, you’ll want to get specific, especially if you’re wisely carving out certain data types for different handling. Documenting your requests is a good practice too.
  4. Prepare to substantiate your needs for #1 & #2. Courts aren’t very willing to entertain overly broad requests for metadata if there isn’t a showing of need. So, be prepared to be challenged and have a solid rationale for the e-discovery request.

(i) His saying, “if ‘its’ and ‘buts’ were candy and nuts it would be Christmas all year long” is another great pearl, but I couldn’t find a good case law tie-in.