Posts Tagged ‘search’

This Time It’s For Real: “iClearwell” Is Available On The iPhone And iPad

Monday, July 12th, 2010

On April 1st, we had some fun by revealing the magical properties of “Clearwell for the iPad.” In truth though, we were only half joking because, at the time, we actually had an application for the iPhone and the iPad in development.

As Clearwell’s user base grew, and we became a mission-critical application to so many people, we learned that our users want access to the product from anywhere, not just when at their desks. In particular, for Clearwell administrators, it’s a lot more convenient logging into cases or checking the status of processing on an iPhone than it is being tied to a computer. So we created this companion application for the iPhone and iPad so they could do just that, as well as view job details, email logs, and generally manage their Clearwell appliances while on the go.

The driving force behind this new application, which we call “iClearwell”, is one of our developers, Gim, who drove its development. Gim also created a video to explain exactly what iClearwell does, which you can see below (yes, it really is his voice – and his pulsating finger).

iClearwell is available for free at Apple’s App Store. I have it on my iPad, and it rocks!

Automated Review in Electronic Discovery Re-Visited

Monday, June 28th, 2010

e-discovery Almost two years ago I wrote one of my first blog posts entitled “Review-less E-Discovery Review.”  Despite the tongue twister of a title, the post posited that “there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.” I’d like to take a look and see how much (if at all) my prognostications have materialized.

A cynic might think that this is the moment where E-Discovery 2.0 jumps the shark.  But no, this isn’t one of those sitcom episodes where they flashback to previous shows as an easy way to recycle content.  Instead, it seems useful to see how the legal market has evolved from a litigation workflow perspective, particularly with some vendors touting the benefits of review-less technologies like predictive coding.

In the original blog, I noted that there was a “scenario where a non-manual review methodology may make sense” (while importantly noting that “this approach is not without risk”).  Since my last post there has been the successful adoption of Evidence Rule 502,which makes this methodology (at least conceptually) safer.

But again (imagine dreamy flashback mode), here were the guidelines I previously proffered:

  1. Large data set.  This may sound a bit obvious, but a non-manual approach is best suited for large, unwieldy data sets.  The corpus doesn’t need to be in the terabytes, but the data set should be evaluated in term of discovery processing costs and attorney review estimates.
  2. Short Production Timelines.  Once the above calculations are conducted, the next step is to determine if a human based review could even conceivably be conducted in the given time frame.  In many instances, an eyes-on review process just won’t be feasible since there won’t be enough bodies to throw at the problem.
  3. Next Gen “PAR” Tools.  In order to pull this “review-less” review process off, both safely and quickly, the responding party needs to have access to fast, robust processing, analysis and review (“PAR”) tools.  Certainly, it’s possible to have this scenario work with an e-discovery service provider, if they have the capability.
  4. Relatively Small Amount in Controversy.  For the time being, this approach should not be considered for any “bet the company” litigation, nor anything with significant downside risk (governmental inquiries, punitive damages, class actions, 2nd requests, etc.).  Yet, for many standard commercial lawsuits, corporate investigations, HR claims, etc. this review-less approach may be worth considering.
  5. Ability to Use a Clawback Provision.  Entering into a clawback provision with the opposition is mandatory in this methodology since the chances of an inadvertent production are statistically ever-present.  Yet, until Evidence Rule 502 is resolved, there will always be a risk that the clawback won’t be enforceable against 3rd parties.
  6. Non-governmental Production.  Most information in governmental productions becomes part of the public record, meaning that a clawback isn’t going to be feasible.  Here, trade secret information, personally identifiably data and the like would be disastrous if pushed out into the public domain.

The goal of this post is to see if this dog is any more ready to hunt than it was two years ago.  The short answer (right now) appears to be: No.

We all know that litigators are both risk adverse and generally slow to adopt new technology approaches.  This is particularly true when there’s a perception that they won’t have insight into the technological black box behind automated coding/tagging decisions.  Litigators are understandably sensitive about the ability to prove up the reasonability of their search and review processes.  This “reasonableness” requirement lines up both with the Victor Stanley requirements and FRE 50(b), which eliminates the chance of a waiver only “if the holder of the privilege or work product protection took reasonable precautions to prevent disclosure.”

Given this ongoing hesitancy, the question remains shouldn’t we be seeing more movement in automated review than the glacial progress that’s been achieved to date, particularly with the known shortcomings of the eyes-on review process?  Most are familiar with the 1985 STAIRS study by Blair and Marion where the percentage of relevant documents lawyers thought they had found using Boolean Keyword searches was 75% – when the percentage they actually found was 20%.

But, despite the known deficiencies of eyes-on review it follows into the “go with the devil you know” mindset that often makes sense when dealing with judges and juries who aren’t likely to grok newer-fangled approaches.

In addition to these high-level, almost dogmatic challenges, there is one other tactical element I’d add to my previous list (of 6 factors).

7. All documents processed up-front (no rolling collection). I’ve heard some in the trenches e-discovery experts claim that they’ve never had a case that didn’t involve at least some level of incremental data collections.  Whether this is an overstatement is immaterial.  The fact is that a large number of e-discovery projects involve ESI that is collected (and then processed) in dribs and drabs.  This if often a good thing, largely attributable to the incremental (start slowly) nature of a well thought out e-discovery project where a smaller number of initial custodians are processed, then ECA is conducted and only then is the additional ESI added to the corpus.  This common methodology causes some significant heartburn for a review-less methodology since the ever changing nature of the corpus makes it difficult/impossible for a sample to be truly extensible to what will eventually be the entire data set.  For this reason, the review-less approach should be limited to where the entire corpus is collected and processed at once.

In sum, the seven foregoing factors appear to still be largely valid and create an environment where an automated, review-less methodology will only make sense in a relatively rare set of circumstances.  This may change in the future, but given the risk adverse DNA of most litigators I can’t imagine this tipping point happening any time soon.

Courts Undecided on How to Handle Email Threads in Electronic Discovery

Monday, June 21st, 2010

Much of the business and personal productivity that comes in the digital world  is from email and its unique abilities. Email allows us to communicate in a way that helps us associate context to our discussions, namely in its ability to be chained into a sequential thread when email users reply to or forward emails they previously received. This accomplishes two important tasks: 1) it allows the person sending the reply or forward to get an understanding of the issues so he/she can craft a meaningful response, and 2) it allows the person receiving the response to understand that response in the context of other on-going discussions. Email programs such as Microsoft Outlook, Eudora, and Gmail help by automatically including content from prior emails, thus producing a long chain of reference.

It is no coincidence that emails thus constitute key evidentiary value in the context of litigation. The inherent value captured in emails is what makes email productions central to pre-trial disclosures and the electronic discovery that precedes it. Courts have long recognized that emails are a business record and subject to discovery. Establishing who said what in the context of a matter in dispute is greatly facilitated by examining the thread of emails recorded in email repositories. With respect to electronic discovery, however, email threading presents several unique challenges. The area of greatest confusion and uncertainty has been the determination of privilege when emails are exchanged with in-house counsel and attorneys and whether such emails are protected by attorney-client privilege or not. A central issue is the composition of privilege logs under these circumstances.

There are several legal opinions on the matter of intermingling privileged and non-privileged communications in an email chain. These opinions have left the matter with little clarity, especially regarding whether the entire email thread is privileged or whether individual emails must be separated out and classified as privileged, with a privilege log listing them. Typically, the most recent email in a thread contains all other emails in that thread. Separating out individual emails (i.e., the contained emails) from the containing email would allow for treatment of just the portions of the email thread that may have privilege. When such separation is permitted, some contained emails may be assessed as privileged while others may not. However, it is entirely possible that the contained email is also present as an independent email under possession of the same custodian or another custodian. When it is present, one could argue that the contained email can just be ignored, and if the corresponding email is responsive, one can ignore the contained email. But rarely does a collection include a complete set of custodians, so the question of whether the privilege log should include the contained item in question still remains. In terms of management of review, and for constructing a privilege log, treating the most recent email and all its contained emails as a single entity is less expensive and cleaner than separating and determining privilege status of each contained email.

Another complicating factor is simply a determination of privilege. Does the mere fact that an attorney was listed as a courtesy CC recipient make the entire email privileged? And, when such emails are then forwarded only to an attorney involved in the case, with a legal strategy discussed in the containing email, is only the new content added to the containing email privileged, or does the privilege determination extend to the other contained emails?  Let’s examine a few opinions for guidance.

With respect to privilege there is a significant body of opinions that would suggest that only communications that explicitly seek legal advice are privileged.

“With respect to internal communications involving in-house counsel, a party “must make a ‘clear showing’ that the ‘speaker’ made the communications for the express purpose of obtaining or providing legal advice”, Chevron Texaco Corp., 241 F. supp 2d) at 1076 (quoting In Re Sealed Case, 737 F.2d 94 (D.C. Cir. 1984)). If the legal and business advice are inextricably intertwined, “the legal advice must predominate over the business advice, and not be merely incidental, for the communications to be protected under attorney client privilege.” Evidently, attempts to include an incidental attorney in a thread would not offer privilege protections. However, the issue is complicated if the most recent containing email is indeed a genuine attempt to seek such guidance. Here again, there are two opinions. In United States v. Chevron Texaco Corp., 241 F. supp. 2d 1065, 1074 n.6 (N.D. Cal. 2002), we note that:

“With respect to each series of emails for which Chevron asserts protection under privilege, Chevron breaks the series into each discrete message. In our view, such a representation of the document is misleading. Each email/communication consists of the text of the sender’s message as well as all of the prior emails attached to it. Therefore, Chevron’s assertion that each separate email stands as an independent communication is inaccurate.”The above would have you prepare a single entity with the most recent containing email and all other quoted emails treated as a single unit. On the other hand, we see the opposite opinion in Universal Service Fund Telephone Billing Practices Litigation, 232 F.R.D. 669, 674 (D. Kan. 2005) where “the court strongly encourages counsel, in the preparation of future privilege logs, to list each email within a strand as a separate entry”. In a related ruling, the court notes: “Obviously, a sufficient (i.e., reasonably detailed) privilege log is vital if litigants and judges are to determine whether documents have been properly withheld from discovery.” As mentioned earlier, this can be much more expensive from a review and production standpoint.

In Chemtech Royalty Assoc., L.P. v. United States, Nos. 05-cv-00944, 06-cv-00258, 07-cv-00405, at (M.D. La. Mar. 30, 2009), we get another perspective: “Asserting privilege for an entire email thread in the privilege log, but only describing the last message in the thread is deficient.”

In Baxter Healthcare Corp. v. Fresenius Med. Care Holding, Inc., No. 07-cv-01359, 2008 BL 229777 at (N.D. Cal. Oct 10, 2008), the defendants are ordered to produce a privilege log that “separately identifies the author, recipient(s), copyee(s), and blind carbon copyee(s) for each logged email communication regardless of whether the communication is part of an email string”. The court directive is: “Each email is a separate communication, for which a privilege may or may not be applicable. Defendants cannot justify aggregating authors and recipients for all emails in a string and then claiming privilege for the aggregated emails.”

Thus, the contained emails must be treated as separate privilege log entries.

In Vioxx Products Liability Litigation, 501 F. Supp. 2d 789, 812 (E.D. La 2007) the court notes:

“Email threads in which attorneys are ultimately involved were usually listed on the privilege log as one message.”  Further, “Simply because technology has made it possible to physically link these separate communications (which in the past would have been separate memoranda) does not justify treating them as one communication and denying party a fair opportunity to evaluate privilege claims raised by the producing party.”

Again, the preference has been to separate out individual contained emails as independent emails with corresponding privilege log.

In C.T.  v.  Liberal School District, Nos. 06-cv-02093, 06-cv-02360, 06-cv-02359, 2007 BL 21826 at (D. Kan. May 24, 2007), the court orders the plaintiff to submit an amended privilege log that listed email in a string as a separate entry.

In Se. Pa. Transport Authority v. Caremark PCS Health, L.P., 254 F.R.D., 253, 264-65 (E.D., Pa 2008) court recommends “analyzing emails in chain separately to rule on defendant’s privilege claims”.

Another significant opinion is found in Muro v. Target Corp., 250 F.R.D. 350 (N.D. Ill. 2007). In addition to at least four motions, an in camera review  was requested for identifying the privilege status of eighty nine documents. Here, the court ruled that FRCP Rule 26(b)(5)(A)  does not require that all contained emails be separated out. However, the court sustains Target’s objection to the Magistrate Judge’s ruling that its privilege log was inadequate for failure to separately itemize each individual email quoted in an email string. In Muro, though, you are allowed to treat an entire email as a single entity only if the non-privileged communications in that chain are otherwise disclosed. Hence, if you wish to treat an email as a single unit, you are required to either disclose the individual contained emails from other custodians, or to list them as Derived Emails (see below).

Another important case is the Rhoads Industries Inc. v. Building Materials Corp. of America et al 2008, WL 5082993 (E.D. Pa Nov. 26, 2008), where the court rendered the opposite opinion:

“Each version of an email string (i.e., a forward or reply of a previous email message) must be considered a separate, unique document, and therefore each message of the string which is privileged must be separately logged in order to claim privilege in that particular document.”

Of course, the context of the Rhoades opinion is the statement: “In the world of electronic communications, a series of email messages, among people employed by the client, but working in different locations, can replace the meeting with an attorney and subsequent letter.” However, this opinion is very debatable.

An entirely different approach is suggested in Apsley v. Boeing Co., No. 05-cv-01368, 2008 BL 12035 at (D. Kan. Jan 22, 2008), with the opinion “Although Boeing listed on its privilege log entire email strings, it redacted only the portion of the string that contained legal communications.” While this seems to be a perfectly reasonable approach, wouldn’t this compromise case strategy since the very fact that certain portions of the non-privileged, unredacted emails were being exchanged with in-house counsel and is therefore part of an attorney communication can be damaging?

Suffice it to say, the courts differ in their opinions on how to handle email threads and their privileged logs. It is in this context that the Clearwell E-Discovery Platform’s treatment of email threads is extremely helpful for preparing your litigation response. In fact, Clearwell has received two patents related to email threading, one for constructing email threads and its ranking and another for determining derived emails from other containing emails and de-duplication in the context of original emails. Clearwell has advanced email meta-data and content analytics to piece together all emails of a thread. Furthermore, its Derived Email feature separates out contained emails as complete emails, which are then de-duplicated against other emails that are not derived from a contained email. In situations where such a duplicate is not identified, the derived email is maintained in a special state. Also, the containing email’s thread is separated out in such a way that each individual email’s privilege status can be determined. One can apply either a single- or multiple-record policy satisfying whatever the prevailing opinion is from the bench. Also, Clearwell’s redaction capabilities and its ability to produce the same set of documents for multiple parties allow the case team to provide a quick turnaround if there is a motion to produce either a privilege log or the non-privileged snippets of emails. Such technology can be a lifesaver when it comes to meeting electronic discovery obligations.

Kroll Ontrack and Iron Mountain Stratify Demonstrate That “Free” Is Usually NOT The Cheapest Solution For Electronic Discovery

Tuesday, June 1st, 2010

Every car dealer knows he should focus customers on the monthly payment, not the total cost of the car. Every credit card solicitation (or sub-prime mortgage, for that matter) starts with the offer of 0% interest, not the actual interest rate or fees the customer will pay after the first 6 months. The reason is simple: once you lease the car or put a balance on the credit card, it’s very hard to switch away when – as often happens – you find yourself paying much more than you should later on.

I was reminded of these examples when reading about Kroll Ontrack’s offer of “free ECA” and Stratify’s recent press release announcing “free early stage filtering” for electronic discovery. Taking each in turn:

Kroll Ontrack Advanceview

Based on feedback from several customers in Washington DC, New York, and the Mid-West, Kroll Ontrack often provides Advanceview at no charge. That means customers can get “custodian de-duplication” and “1 keyword and date filter pass” for free, although Kroll still charges $200-250/hour for doing the work. The resulting data set is then processed and loaded into its review platform for $1,500-$1,800 per gigabyte.

Is this a good deal? For the vast majority of customers, the answer is “no” for three reasons.

First, customers typically end up paying more than they would using alternative products. For example, in the chart below, we compare the cost of using Kroll Ontrack to that of Clearwell for a 100 gigabyte project. In both cases, we assume customers are doing de-duplication, filtering, keyword searching, first pass review, and load file creation. As with any comparison of this sort, you have to make some simplifying assumptions. For example, we excluded data hosting fees and professional services fees from the analysis.

Whether customers are better off with Kroll depends entirely on how much data is culled out for free before customers incur the high, back-end charges. Given that all Kroll is doing for free is custodian de-duplication and running one set of keywords and date filters, the typical cull rate is likely be anywhere from 20% to 50% — nowhere near the 80% cull rate required for Kroll to be more cost effective than Clearwell.

The second reason why this is not a good deal is that it gives customers no certainty about costs. Culling rates from de-duplication and blind keyword searches are unpredictable and vary widely, meaning that some projects will cost more than expected while others will cost less. But every project has budget that’s determined up front and, as any litigation support manager will tell you, you get much less credit for being under budget than you get pain for going over budget. That’s why cost certainty is one of the leading requests from anyone involved in electronic discovery.

Finally, excluding data based on a single round of keyword searches and date filters is not in line with The Sedona Conference best practices. Rather, Sedona recommends that customers iterate their keywords and culling strategies to hone them appropriately.

Iron Mountain Stratify OnPoint

It is not yet possible to do the same detailed analysis on Stratify’s OnPoint which offers “free early stage filtering”, because it’s impossible to tell exactly what that means. In its artfully-worded press release and data sheet, Stratify promises to provide “free processing and loading of unlimited data for early stage filtering”. Does that include de-duplication? Does that include any keyword searching? My guess is “no”, in which case all they are really doing for free is offering to load data into their review platform so that they can then charge you – not a very compelling offer. But if anyone does know the answer to these questions, or if Stratify would like to clarify exactly what’s being offered for free, then please let me know and I’ll post an update.

Once data is in Stratify’s system, it charges a “one-time fee starting at $500 per gigabyte” for “reviewable data”. But it does not say if that’s the only fee. What about monthly hosting charges? Fees for additional reviewers? Again, it’s not yet clear what the downstream cost of review really is using Stratify, so it’s impossible to know whether this is a good deal.

If there’s one lesson from all of this, it’s “buyer beware”. Just as when you buy a car, sign up for a credit card, or click on that offer to get more corn on Farmville, you need to look beyond the “free offer” and understand what it’s really going to cost you.

As the Electronic Discovery World Zurns

Wednesday, July 29th, 2009

Judge Grimm’s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.

“[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

Despite, legions of articles and blogs on the topic, at least certain portions of the bench haven’t taken heed.  In the case In re: Zurn Pex Plumbing Products Liability Litigation, 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter “Zurn“), U.S. District Judge Ann Montgomery receives points for understanding some basic e-discovery tenants around recall and precision, but then mysteriously goes where “angels fear to tread” by suggesting her own search terms.

Examining the case facts in more detail,…  Zurn is a class action products liability case where discovery was bifurcated (as is often the case – see Spieker v. Quest Cherokee) to first cover the class “certification” component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that “while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.”  However, the Magistrate didn’t shut the door entirely, suggesting that “should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.”

Despite complying with Sedona’s Cooperation Proclamation (”The parties have worked amicably throughout the discovery process”) opposing counsel still got to loggerheads when plaintiff found “voids” in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.

Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff’s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared “J” and “K” drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required “approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.”  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could’ve added another $400,000 for the project.

Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney’s representations about the resulting burden:

“It is unclear whether Zurn’s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians’ emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.”

The 361 gigabytes apparently resulted from “hits” corresponding to plaintiff’s 26 search terms.  The court correctly identified that those terms had precision issues (”many of Plaintiffs’ proposed search terms will likely produce a large number of ‘hits’ that have limited relevance in the case.”)

Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm’s warning and surprisingly took matters into her own hands: “the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.”  Here is the Judge’s list of keywords:

(1) AADFW,
(2) Corrosion,
(3) Corrosive,
(4) Corrosive Water,
(5) Crack,
(6) De-zinc,
(7) Dezincification,
(8) DZR,
(9) Fail,
(10) IMR,
(11) Leak,
(12) MES,
(13) SCC,
(14) Stress corrosion cracking

Without looking at the underlying data, it’s clear from the outset that Judge Montgomery didn’t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could’ve been captured by a single stemmed search using the term “corros*.” Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would’ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.

Judge Montgomery did recognize the potential error of her ways and gave the parties an out:

“The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.”

This “specific evidence” language seems to track notions from Sedona’s search best practices protocol, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless blindly proffer the 14 term search strategy.  Instead, she should’ve quoted Victor Stanley and required the parties to come up with a data driven approach that met requisite precision and recall metrics.

EDRM Continues Drive to Solve Practical Electronic Discovery Problems

Tuesday, June 23rd, 2009

As most electronic discovery veterans are aware, the EDRM Project is an effort founded five years ago by George Socha and Tom Gelbmann to bring together a community of e-discovery practitioners for the purpose of solving some of the industry’s most challenging problems.

It may be hard to believe, but there was time in the very recent past where the iconic EDRM model did not yet exist. No multicolored boxes, no arrows, no sloping volume and relevance lines — nothing. Coming up with a standard way of talking about electronic discovery was the first problem that the group set about solving, and I think it would be hard to argue with the fact that they came up with the gold standard: a simple, clear, concise model that, at least so far, is standing the test of time as a way of thinking about the flow of the e-discovery process.

With each passing year, the group has started to address a broader set of problems, all with a practical bent.  Currently, there are eight:

Project Goal
Evergreen Keep the EDRM model fresh and relevant as the industry grows and evolves
XML Provide a standard, generally-accepted XML schema to facilitate the movement of electronically stored information from one step of the e-discovery process to the next
Metrics Provide an effective means of measuring the time, money, and volumes associated with e-discovery activities
Code of Conduct Develop aspirational voluntary ethical guidelines for e-discovery providers and consumers
Search Provide a framework for defining and managing the various aspects of search as it applies to the e-discovery workflow
Data Set Compile a 100 gigabyte public data set that can be used to test various aspects of e-discovery software and services
Jobs Provide a professional resource for the e-discovery community and  communicate about e-discovery related jobs
Information Management Explore the emerging need for e-discovery standards in information management (the “upstream” part of the process)

This year’s annual EDRM conference took place back in May. After years of meeting in the same chilly and wind-swept location in downtown St. Paul, Minnesota, George and Tom had the brilliant idea of spicing up the meeting a bit by moving it to a more exotic locale: Bora Bora! Plans were set in motion, but quickly the overwhelming feedback came back from EDRM members: E-discovery is so fascinating, so heart-warming, that adding Bora Bora to the mix would simply be too much for the vast majority of the participants to bear. So St. Paul it was!

This was Clearwell’s third EDRM conference, and location aside, it’s been fascinating to see how it has changed over the last few years. Here are several notable trends from this year’s kickoff:

  • More participation from end-users: There was a definite increase in the number of end-user/consumer participants (that is, those not from the vendor community), particularly from law firms. This could be taken as further evidence that e-discovery is indeed moving in-house.
  • Increased enthusiasm to take on new challenges: One of the great things about EDRM is its willingness to try to tackle new areas that aren’t being directly addressed by some of the other (fantastic) organizations out there like Sedona. This was in evidence several years ago, when Clearwell was fortunate to get involved in the early stages of the EDRM XML project, which has proven to be a huge time, cost, and risk reducer for many in the industry by providing a common standard that can be used to move data within the e-discovery process. It was in evidence last year when Clearwell’s CTO was able to help launch a new effort around Search that is seeking to develop standards and best practices in an increasingly complex and contentious area. And, finally, it was in evidence this year with the launch of the Information Management project, a cutting-edge group that is exploring how to solve the challenges that e-discovery poses for information management – certainly a complex area in need of thought leadership.
  • Improved collaboration: One thing that has amazed us from day one is how collaborative EDRM is, and continues to become. There are a lot of e-discovery vendors involved who, outside of the confines of the St. Paul Hotel, aggressively compete in the marketplace. However, George and Tom have been able to create an environment at EDRM where competitive spirits are set aside and ideas can be cultivated which provide huge value across the e-discovery landscape (both vendor and consumer).

One final note: If you’re an e-discovery practitioner in a law firm or corporate setting, I’d encourage you to get connected, either informally (through the EDRM web site) or formally (by signing up for one or more of the projects). While end-user involvement continues to grow, there is definitely still a need for more non-vendor involvement. It is critical in ensuring real and relevant problems get solved, and to pushing the state of the art in e-discovery forward. Please join us!

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.

Five Electronic Discovery Questions Regarding Inaccessibility With David Isom

Thursday, April 30th, 2009

David Isom and I have collaborated a number of times over the years on a variety of electronic discovery presentations and articles.  So, when I saw that California was proposing new state electronic discovery rules that had some interesting variances vis-à-vis the FRCP, I thought David might be able to give us the benefit of his unique and sage perspective.

1. David, as the author of the definitive piece about inaccessibility under the Federal Rules of Civil Procedure (The Burden of Discovering Inaccessible Electronically Stored Information: Rules 26(b)(2)(B)& 45(d)(1)(D)), how many litigators do you think really understand and use these provisions?

I sense that litigators with a basic understanding of the new electronic discovery rules know that the inaccessibility rule exists and provides some protection for parties against unduly burdensome discovery.  Few seem to have noticed that Rule 45 contains an inaccessibility provision whose language is similar to the Rule 26(b)(2)(B) inaccessibility protection for parties, but whose protections as applied to subpoenaed nonparties are greater than the protections for parties.  Here are the three most basic and exciting (or excruciating, depending upon your side of the fence) impacts of the new inaccessibility rules:

(1) The inaccessibility rule has completely changed a nonparty’s leverage to narrow subpoenas seeking electronically stored information (ESI).  Subpoenaed nonparties now have protection against fishing expedition subpoenas that did not exist before — to narrow subpoenas, or to require the payment of costs and attorney fees in responding to broad subpoenas.

(2) Cost-shifting, for parties as well as nonparties, is now controlled by the inaccessibility rules.  Several federal courts have recently held that discovery cost-shifting is allowed only if these inaccessibility rules provide for cost-shifting under the circumstances.

(3)  The inaccessibility rules must be asserted and asserted timely if they are to provide protection.  For example, after counsel for nonparty Office of Federal Housing Enterprise Oversight spent $6 million of our money responding to a subpoena in In re Fannie Mae Securities Litigation, 552 F. 3d 814 (D.C. Cir. 2009), counsel tried to recover the money on an inaccessibility cost-shifting argument.  To which the United States District Court and the Court of Appeals for the District of Columbia said, in essence:  you might have had a good idea, and saved your client $6 million, had you raised the arguments before agreeing to produce the documents and spending all that money.  But you agreed to produce the ESI and cannot come back now and get any protection.  You should have studied the inaccessibility rule.

2. So, assuming we’re still early in the learning curve, do you think these FRCP provisions are really gaining traction either in practice or in the case law?

Judging by the number of reported decisions, the inaccessibility rules are receiving as much attention as the other new features of the federal electronic discovery rules.  Which, I suppose, is damnation by faint praise — a large percentage of the reported cases are about what should happen because lawyers didn’t understand or apply the rules properly. Cason-Merenda v. Detroit Medical Center, 2008 U.S. Dist. LEXIS 51962 (E.D. Mich. July 7, 2008) is a good example.  There, defendant’s counsel produced ESI without any objection and without pre-identifying the ESI as inaccessible.  After production, counsel tried to get their opponents to share the cost of producing the allegedly inaccessible ESI.  The court correctly held that the ESI must be identified as inaccessible in advance of the production to give the seeking party the option to decide whether the discovery is really worth the candle, especially given the prospect that the cost of production might be shifted to the seeking party.

3. What are your thoughts on the new California state provisions regarding “inaccessible” ESI where they’re proposing a different treatment and slightly different burden?  And, will this approach ultimately weaken responding parties abilities to make “inaccessible” claims successfully?

I am not an expert on California law, but am keenly interested in what the states are doing with electronic discovery.  As of this writing (May 2009), it appears that California Assembly Bill No. 5 has not yet been enacted.  Yet, here are some thoughts about how the inaccessibility provisions of this bill, if enacted, would compare to the federal rules of inaccessibility.  The bottom line is that the California bill is remarkably similar to the federal rules on inaccessibility issues.

Under the federal rules, a party seeking protection for inaccessibility initiates the process by “simply” (so far, the courts have tolerated fairly sparse identifications as satisfying this requirement) identifying the sources of information claimed to be not reasonably accessible because of undue burden or cost.  The subpoenaed nonparty seeking protecting can initiate by identifying the ESI sought as not reasonably accessible in an objection, motion to quash or motion for protective order.  In the federal system, either the seeking party or the protecting party or nonparty can move to test the issue (one by a motion to compel, the other by a motion for protective order).

The California bill is nearly identical to the federal process.  The bill provides that a person resisting a subpoena for ESI on inaccessibility grounds may “oppose” the subpoena.  If this means that such a person can either object or move to quash or move for a protective order, it appears to be the same as the federal rule.  The California bill specifies that a party resisting a production request on inaccessibility grounds initiates protection by identifying the types or categories of sources of electronically stored information that it asserts are not reasonably accessible.  This is similar to the federal rule, whose text requires identification of “sources”, but whose committee notes clarify that merely “types or categories of sources” of inaccessible, responsive ESI need be identified.  The California’s Legislative Counsel’s Digest indicates that the process for protecting inaccessible ESI, apparently for both parties and subpoenaed nonparties, can be initiated by moving for a protective order, or by opposing or objecting to the subpoena or request.

Even if there are any distinctions in the above processes, the two processes appear to merge thereafter.  In both systems, the motions to test inaccessibility must be preceded by a conference of counsel to attempt in good faith to resolve the issue, together with a certificate that such an attempt has been made.  In both, the person seeking protection has the burden of proving inaccessibility (this is even true in the federal system where the process is initiated by the seeker’s motion to compel).  In both systems, if the holding party proves inaccessibility, the burden shifts to the seeking party to show good cause for producing the ESI, despite its inaccessibility.

And in both, if good cause is shown, the court may still impose conditions upon production, including cost-shifting.  In both, the factors that the courts are to consider in determining good cause are similar — more accessible, less burdensome sources; cumulativeness of the discovery; whether the burden or expense of the discovery would outweigh the likely benefit of the discovery, considering such things as the importance of the issues, the amount in controversy and the resources of the parties.  One possible difference between the California bill and the federal rules on good cause is that the California bill requires the court to limit discovery if any of the listed factors exists, where the federal rules and committee notes seem to envision a pure balancing.

In sum, the California bill essentially adopts the federal approach.

Some confusion has arisen because California commentators have drawn a distinction between the California bill and a misinterpretation of the federal rules.  One commentator, for example, stated that “under the federal rules, if ESI is inaccessible, the responding party simply doesn’t need to produce such documents.”  This ignores the affirmative identification duty that I discussed above.

4. With the rapid advancements in ESI restoration technologies, which the Comments to the Rule anticipated, are backup tapes in your mind still “inaccessible”?

The rules make it clear that inaccessibility cannot be measured by technology category alone.  The test does not depend upon the type of technology involved, but upon the balancing of need, technology, importance, spoliation, relevance, alternative sources and potential benefit against overbreadth, burden and cost.  So, if backup tapes are the only source available for important, relevant information because more accessible relevant sources have been spoliated, backup tapes will not be deemed inaccessible.  Without spoliation, if relevant ESI is available on active sources, backup tapes may not be discoverable.

Perhaps the main reason that categories of technology cannot be deemed per se accessible or inaccessible is that the technology is changing so fast.  Many search tasks that were expensive and difficult five years ago are much more doable now.

5. Finally, what do you think the future holds for these FRCP sections?

The inaccessibility rules will continue to be the main battleground where the great debates about the value and cost of electronic discovery will be fought, since these rules are specifically tailored to balance all of the interests in that debate.

Some groups are claiming that electronic discovery is wasteful and expensive, and that the new rules exacerbate the problem.  Of course, the federal rules ought always to be analyzed for problems and need for improvement, but I haven’t heard informed, thoughtful, helpful suggestions for improvements to the federal rules in the recent debate.  Overall, I see the adoption of the federal rules as having helped reduce the cost of electronic discovery, not increased the cost.

A Gross Inability to Craft Electronic Discovery Searches

Thursday, April 9th, 2009

The bashing of our judicial system seems to have reached a fevered pitch.  Groups like the American College of Trial Lawyers (”ACTL”) have proclaimed in a recent report that while the “civil justice system is not broken, it is in serious need of repair.”  The blame game seems to have judges and attorneys alike pointing fingers.  The Fellows of the ACTL (perhaps not surprisingly) seems to pin some of the blame on the judiciary:

“Judges should have a more active role at the beginning of a case in designing the scope of discovery and the direction and timing of the case all the way to trial. Where abuses occur, judges are perceived not to enforce the rules effectively.”

Groups like the Sedona Conference chalk up many of the ills to the failure to cooperate, so much so that they’ve orchestrated a cooperation proclamation – which has picked up enough support by the bench to have garnered several cites in the case law (see e.g., Mancia).

The bench for its part seems to put some of the onus on litigators and their reticence to get with the times.  William A. Gross. Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 2009 WL 724954 (S.D.N.Y. Mar. 19, 2009) is the latest example of such a proclamation.  In this construction defect case, Judge Peck (a Sedona devotee) issues what he hopes will be a “wake-up” call to the bar about the need for “careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information (‘ESI’).”  In Gross, the court had to mediate an e-discovery dispute where the requesting party propounded a blatantly over-inclusive search request crafted by the requesting parties.  Unfortunately, the responding entity was a non-party and they simply dig their heads in the sand.  In order to facilitate a resolution this left the Court in the “uncomfortable position” of having to craft a “keyword search methodology for the parties, without adequate information from the parties (and Hill).”

Judge Peck’s exasperation with these antics was palpable.  Summing up the problem by citing Judge Grimm and Victor Stanley he stated: “This case is just the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails.”  He further noted: “[w]hile this message has appeared in several cases from outside this Circuit, it appears that the message has not reached many members of our Bar.”

After noting both Sedona and Judge Facciola (of O’Keefe and Equity Analytics fame) Peck’s opinion reached a crescendo:

“Electronic discovery requires cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI. Moreover, where counsel are using keyword searches for retrieval of ESI, they at a minimum must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives.’ It is time that the Bar-even those lawyers who did not come of age in the computer era-understand this.”

While it’s easy to see who Peck blames in this brouhaha, it takes (at least) two to tango.  Meaning that litigants on both sides of the “v” must move beyond the typical “seat of the pants” electronic discovery wrangling.  And, judges need to be savvy enough to spot the issues to help/force the parties into such an enlightened/cooperative state.  Nothing short will get the job done.

Government Launches Bold New Recovery Effort

Tuesday, March 31st, 2009

While we don’t normally report news on the blog, this article seemed important enough to repost in its entirety…

SEEKING NEW AVENUE FOR COST-CUTTING, GOVERNMENT LAUNCHES BOLD NEW RECOVERY EFFORT

WASHINGTON — Senior Administration officials today took the wraps off of their latest effort to stabilize the American economy: The nationalization of the electronic discovery industry. According to a senior official who declined to be identified, “Even before the beginning of the current turmoil, everyone acknowledged that electronic discovery costs were out of control. Now, with litigation accelerating and corporate earnings plummeting, something had to be done. Without this action, a significant number of leading American corporations would be in danger of shutting their doors due to the overwhelming burden of e-discovery.”

A Single Common Portal

Effective immediately, all electronic discovery projects are being centralized under a single authority, the National Electronic Record Discovery Institute (NERDI). The Institute will be launching a nationwide electronic discovery portal on April 1, 2009 at www.ediscovery.gov. The site will build upon the recent success of the government’s economic recovery accountability site, www.recovery.gov. Said one Institute official, “Just drop the ‘r’ and insert a ‘dis’, and you get eDiscovery. It really is the next logical step in the government’s efforts to help the country in a time of profound need.”

Industry experts initially expressed skepticism about the government’s ability to make electronically discoverable information available in an efficient, expedient, and secure manner. Early plans had the government using the U.S. Postal Service and the network of I.R.S. tax return servicing centers as the logistical backbone for managing the collection and processing of documents. However, after negotiations with the National Security Agency, this step was eliminated from the process. Instead, all electronically-generated information in the United States will be instantly processed and made available through the ediscovery.gov site. Commented an NSA spokesman, “We have all the information anyway; why not make it easily accessible, instead of pretending it’s not here?” As for security, officials stated that “individuals can expect the same level of security and identify protection they’ve come to expect from their financial institutions and credit card companies, along with the additional protection and responsiveness they’ve come to expect from the Federal government.”

The Future of the E-Discovery Industry

What will become of the existing electronic discovery industry, made up of hundreds of individual vendors with aggregate revenue estimated to be in the $2-3 billion dollar range? According to a senior-level NERDI director, “One word: toast.” However, a group of industry software vendors and service providers has expressed open skepticism about the ability of a historically incompetent, multilayered bureaucracy to deliver electronic discovery services more effectively than the competitive market.

One vendor pointed out that it will be “difficult for the government to establish itself as a credible player in electronic discovery with millions of White House emails still missing without a trace.” In response, the group of vendors that make up the Top 5 Software and Service Provider lists on the 2008 Socha-Gelbmann survey (Autonomy, Clearwell, Fios, FTI, Guidance, Kroll, and LexisNexis) have announced an immediate consolidation of operations under the name ClearGuideAutoKrolLexFTios. Gloated new incoming CEO Rick Wagoner, “Our expectation is to roll over the government’s efforts like our new name rolls off your tongue.”