Posts Tagged ‘Sedona’

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

Top Five Predictions in Electronic Discovery

Monday, November 15th, 2010

What’s next in the electronic discovery world?  Well, it’s nearly impossible to say with too much precision, but my recent e-discovery trends article attempts to peer into the crystal ball to divine some hints about the future.

The following five predictions are what I expect to create the biggest waves in e-discovery in 2011.  Most are nascent trends that we’ve seen a bit of in 2010, but that should continue to accelerate next year.  Enterprises that can prepare for and understand these areas will be well equipped to continue taking a proactive approach to the ever-changing challenges of e-discovery.

  1. Changes in Forensic Best Practices: In 2011, manual forensic imaging will continue to take a backseat to more automated, forensically sound data collection techniques.  Forensic (bit for bit) images have long been the gold standard for the legally sound collection of ESI in response to legal proceedings.  And, while forensic imaging will continue to be important in a number of discrete situations (fraud, misappropriation of trade secrets cases, etc.), it will largely be seen as overkill in basic electronic discovery cases.  Since imaging is both time consuming and highly manual, automated collection tools will increasingly be used by savvy organizations to speed up and streamline the collection process.
  2. Consolidation in the Electronic Discovery Industry: Consolidation in the electronic discovery sector will impact market forces and the balance of power.  The past year saw traditional, pure-play electronic discovery companies looking (sometimes successfully and sometimes not) for diversification and deep pockets.  In the upcoming year, the relative dearth of pure play EDD companies may reverse the downward price pressure that’s been seen over the past several years.
  3. Proportionality Becomes Reality: Burgeoning data volumes, as seen in multi-terabyte (versus gigabyte) cases, means that the legal community will continue to search for ways to prevent electronic discovery costs from exceeding legal exposure and attorneys fees.  Groups like The Sedona Conference will continue to push for better clarification within the community surrounding “proportionality” in order to keep the electronic discovery “tail” from wagging the litigation “dog.”  If successful at all, there may be a slight respite for litigious enterprises that may be able to better scale e-discovery efforts with the risk profile of the matter at hand.
  4. Collision of Cloud, Social Media and E-Discovery: The seemingly unstoppable migration of corporate data to the cloud, combined with the proliferation of social media applications, will continue to stress electronic discovery practitioners as they attempt to preserve, collect, search, and process electronically stored information (ESI) from sources that aren’t traditionally managed behind the firewall.  Proactive enterprises will increasingly evaluate the legal and compliance risks of storing data in the cloud so that they’re not painted into a corner when they need to preserve, collect, and produce offsite ESI.
  5. Global E-Discovery Matures: International jurisdictions will increasingly look to the United States (and the Federal Rules of Civil Procedure) as their nascent electronic discovery paradigms are increasingly stressed by the proliferation of both ESI and discovery disputes.  The recent Goodale case out of the UK (and impending procedural changes to the e-Disclosure Practice Direction) demonstrates how the global community is rapidly maturing along the electronic discovery continuum.

While the tools and best practices designed to combat top ediscovery hurdles continue to mature, the challenges are multiplying at any equally fast rate.  In the past, the crux of most discovery matters usually centered around email and sometimes instant messaging.  In 2011, new problems will continue to crop up on the horizon, such as collecting SharePoint data from the cloud, trying to extract structured data from a range of proprietary systems and capturing ephemeral ESI from an ever changing array of social media applications.

Please let me know if you disagree with any of the predictions or have any others you’d like to share.

Fulbright Litigation Survey Calls Out Need for More Proportionality/Rules Changes

Thursday, November 11th, 2010

Fulbright & Jaworski recently issued its “7th Annual Litigation Trends Survey Report” and there were several interesting trends worth noting.   Not surprisingly, the general pace of litigation is forecast to increase upwards, relatively unabated, with more than 25% of respondents expecting their companies’ disputes to increase in the next 12 months.

Beyond this trend it’s clear that there’s also groundswell of support for a movement towards more e-discovery proportionality.  While also a big topic at Sedona’s annual conference (and discussed in the recent Moody case), a whopping 79% of US respondents think the “US Rules of Civil Procedure should be modified in some way to limit e-discovery in civil cases.”  While I haven’t heard of any specific proposals for a rules amendment, it’s clear that folks aren’t happy with the status quo, particularly with the increasing discovery burden facing enterprises dealing with unilateral disputes.   This discontent is likely tied to the fact that costs continue to escalate, with the survey indicating that more than 40% of the largest US companies (over $1B in Revenue) plan to “increase their spending on e-discovery in the next 12 months.”

Finally, the survey also focused on an area that’s getting an increasing level of scrutiny.   Fulbright asked “when preserving potentially relevant information in litigation or an investigation, what methods do you use most frequently for preserving electronically stored information?”  Leading the pack, with 55% of vote, was “rely on individual custodians to identify and preserve their own information.”  Custodian based collections have been discussed recently as being under fire in blogs and other recent cases such as Pension Committee and Ford Motor Co. v. Edgewood Properties Inc. The notion is that under- or un-supervised collection methodologies are dangerous because it’s relatively easy to paint the custodians at issue as either being motivated to hide responsive data or relatively unconcerned with compliance.  Nevertheless, it’s clear that (as of now) custodian-based collections are still somewhat “reasonable” given that more than 50% of the populous collects data this way.

On the other side of the spectrum from custodian based ESI collections, there are automated data collection tools and methods that can be considered too.  There are undoubtedly advantages (risk reduction, speed, audit trails, etc.)  to using “automated search software” for the collection of data (like 43% of the respondents did in the Fulbright survey).  Yet, it’s clear this isn’t a zero sum game – meaning there’s currently a place for both methodologies in the legal landscape.  For many organizations it becomes a risk management exercise as summarized in a recent  ARMA article entitled “Is ‘Manual’ Collection of ESI Defensible?”: “Companies may choose the manual collection of ESI to reduce costs, particularly if they have limited levels of litigation or lower risk levels posed by the litigation itself.”

In the end, like so many aspects of electronic discovery, almost any well thought out, well documented methodology *can* be defensible, but the onus is on the preserving/collecting party to buttress whatever poison they pick.  Defaulting into a method without preparation, auditing and follow-through is a recipe for disaster.

Kroll Ontrack and Iron Mountain Stratify Demonstrate That “Free” Is Usually NOT The Cheapest Solution For Electronic Discovery

Tuesday, June 1st, 2010

Every car dealer knows he should focus customers on the monthly payment, not the total cost of the car. Every credit card solicitation (or sub-prime mortgage, for that matter) starts with the offer of 0% interest, not the actual interest rate or fees the customer will pay after the first 6 months. The reason is simple: once you lease the car or put a balance on the credit card, it’s very hard to switch away when – as often happens – you find yourself paying much more than you should later on.

I was reminded of these examples when reading about Kroll Ontrack’s offer of “free ECA” and Stratify’s recent press release announcing “free early stage filtering” for electronic discovery. Taking each in turn:

Kroll Ontrack Advanceview

Based on feedback from several customers in Washington DC, New York, and the Mid-West, Kroll Ontrack often provides Advanceview at no charge. That means customers can get “custodian de-duplication” and “1 keyword and date filter pass” for free, although Kroll still charges $200-250/hour for doing the work. The resulting data set is then processed and loaded into its review platform for $1,500-$1,800 per gigabyte.

Is this a good deal? For the vast majority of customers, the answer is “no” for three reasons.

First, customers typically end up paying more than they would using alternative products. For example, in the chart below, we compare the cost of using Kroll Ontrack to that of Clearwell for a 100 gigabyte project. In both cases, we assume customers are doing de-duplication, filtering, keyword searching, first pass review, and load file creation. As with any comparison of this sort, you have to make some simplifying assumptions. For example, we excluded data hosting fees and professional services fees from the analysis.

Whether customers are better off with Kroll depends entirely on how much data is culled out for free before customers incur the high, back-end charges. Given that all Kroll is doing for free is custodian de-duplication and running one set of keywords and date filters, the typical cull rate is likely be anywhere from 20% to 50% — nowhere near the 80% cull rate required for Kroll to be more cost effective than Clearwell.

The second reason why this is not a good deal is that it gives customers no certainty about costs. Culling rates from de-duplication and blind keyword searches are unpredictable and vary widely, meaning that some projects will cost more than expected while others will cost less. But every project has budget that’s determined up front and, as any litigation support manager will tell you, you get much less credit for being under budget than you get pain for going over budget. That’s why cost certainty is one of the leading requests from anyone involved in electronic discovery.

Finally, excluding data based on a single round of keyword searches and date filters is not in line with The Sedona Conference best practices. Rather, Sedona recommends that customers iterate their keywords and culling strategies to hone them appropriately.

Iron Mountain Stratify OnPoint

It is not yet possible to do the same detailed analysis on Stratify’s OnPoint which offers “free early stage filtering”, because it’s impossible to tell exactly what that means. In its artfully-worded press release and data sheet, Stratify promises to provide “free processing and loading of unlimited data for early stage filtering”. Does that include de-duplication? Does that include any keyword searching? My guess is “no”, in which case all they are really doing for free is offering to load data into their review platform so that they can then charge you – not a very compelling offer. But if anyone does know the answer to these questions, or if Stratify would like to clarify exactly what’s being offered for free, then please let me know and I’ll post an update.

Once data is in Stratify’s system, it charges a “one-time fee starting at $500 per gigabyte” for “reviewable data”. But it does not say if that’s the only fee. What about monthly hosting charges? Fees for additional reviewers? Again, it’s not yet clear what the downstream cost of review really is using Stratify, so it’s impossible to know whether this is a good deal.

If there’s one lesson from all of this, it’s “buyer beware”. Just as when you buy a car, sign up for a credit card, or click on that offer to get more corn on Farmville, you need to look beyond the “free offer” and understand what it’s really going to cost you.

Manual Collections of ESI in Electronic Discovery Come under Fire

Monday, May 17th, 2010

Jason R. Baron was a keynote speaker at a recent electronic discovery summit and he mentioned an electronic data discovery topic that “ought to be blogged about.”  So, with that kind of softball I had to take a swing, particularly because it’s been a topic we (at e-discovery 2.0) have been discussing lately.

The genesis of this blog (per Jason) is the recent “skepticism” evidenced by the bench regarding the defensibility of custodian based collections.  ARMA has a good piece on this very topic, entitled “Is ‘Manual’ Collection of ESI Defensible?”  The core notion is that the tried and true practice of custodian based ESI collection is now under fire by courts, which appear to be looking at this practice with an increasing level of distrust.

“While it is common for companies to use automated data-collection software and hardware, some corporate litigants opt for more informal, “manual” collection methods (i.e., searches performed by individual records custodians) when responding to ESI requests. Companies may choose the manual collection of ESI to reduce costs, particularly if they have limited levels of litigation or lower risk levels posed by the litigation itself.”

While there’s no dispute that the “automated” collection methods available in litigation software referenced above have a number of features that make this approach more efficient, the question is whether a “manual” (i.e., custodian based) collection process is somehow less defensible.  If this is truly the case, then many midsized companies without the budget to purchase such e-discovery applications will inherently be found deficient – which is a daunting notion.

Take the recent case of Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) where the dispute arose out of the demolition of a Ford assembly plant in New Jersey.  Ford and Edgewood entered into a contract whereby Ford agreed to provide 50,000 cubic yards of concrete to Edgewood in exchange for Edgewood removing it from the site.  When the concrete turned out to be contaminated, the dispute started in earnest.

The crux of Edgewood’s complaint was that it was unhappy with Ford’s production and somehow suspected that the dearth of documents was due to the electronic data collection process.  Edgewood sought to “’confirm the adequacy of Ford’s manual document collection process’ by using a third-party vendor to perform keyword searches on documents not in the existing repository of ESI, but instead, documents within the possession of certain Ford custodians.”

To reconcile the dispute the court looked to the Sedona Conference’s work in the area:

“In The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, Practice Point 1 states that “[i]n many settings involving electronically stored information, reliance solely on a manual search process for the purpose of finding responsive documents may be infeasible or unwarranted. In such cases, the use of automated search methods should be viewed as reasonable, valuable, and even necessary.”(emphasis added). Once again, the Court confronts this peculiar situation insofar as Edgewood has a point that the document collection method used by Ford is not necessarily contemplated under the Sedona Principles, but that agreement by the parties at the outset as to the mode of collection would have been the proper and efficacious course of action.  However, “[a]bsen[t] agreement, a [responding] party has the presumption, under Sedona Principle 6, that it is in the best position to choose an appropriate method of searching and culling data.”

Accordingly, the court found that the lack of agreement coupled with Ford being in the best position to make a call about the methodology, was a deciding factor in generally upholding Ford’s manual collection process.

“It would be improvident at this juncture to grant Edgewood the relief it seeks when it has not shown any indicia of bad faith on the part of Ford. To countenance such a holding would unreasonably put the shoe on the other foot and require a producing party to go to herculean and costly lengths (especially in a document-heavy case such as this) in the face of mere accusation to rebut a claim of withholding. This scenario is not contemplated by the Federal Rules.”

While Ford wasn’t penalized for its manual collection, this practice has come under fire in several other opinions.  In the highly controversial case of Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) custodian based collection/preservation policies were similarly under fire.

“ASUS’ practices invite the abuse of rights of others, because the practices tend toward loss of data. The practices place operations-level employees in the position of deciding what information is relevant to the enterprise and its data retention needs. ASUS alone bears responsibility for the absence of evidence it would be expected to possess. While Adams has not shown ASUS mounted a destructive effort aimed at evidence affecting Adams or at evidence of ASUS’ wrongful use of intellectual property, it is clear that ASUS’ lack of a retention policy and irresponsible data retention practices are responsible for the loss of significant data.”

Adams was in fact cited by Judge Scheindlin in her latest opus Pension Comm. of the Univ. of Montreal Pension Plan v. Banc of America Sec. LLC, No. 05 Civ. 9016, 2010 U.S. Dist. Lexis 4546, at *1 (S.D.N.Y. Jan. 15, 2010), where she found fault with the Plaintiff’s reliance on manual collections:

“This instruction does not meet the standard for a litigation hold. It does not direct employees to preserve all relevant records–both paper and electronic-nor does it create a mechanism for collecting the preserved records so that they can be searched by someone other than the employee.  Rather, the directive places total reliance on the employee to search and select what that employee believed to be responsive records without any supervision from Counsel.

From the foregoing, it’s probably too early to call the skepticism over manual collection a trend per se.  Certainly, lobbing a preservation notice over the proverbial wall to custodians without the requisite level of supervision is a recipe for disaster.  Education (about the matter and the required tasks), compliance (with the preservation instructions) and ongoing monitoring (to ensure that compliance continues over time) are all critical responsibilities that must be thoughtfully undertaken by counsel for a defensible ediscovery process.

The question then becomes, is the problem here really about the “manual” collection efforts by the custodians or more simply the fact that they aren’t supervised with the requisite degree of care?  If this is the case, which I’d opine that it is, then “properly executed” manual collections should be fine (i.e., defensible).

But, as Ford indicates, if your company is going to rely upon a manual collection modus operandi, then it may be advisable to let the opposition in on the use of this tactic.  This approach may be mandated by local rule or it may just be the type of transparent cooperation that’s all the rage these days.

Learn More On Litigation Support Software & Electronic Discovery Litigation

New York State Court Issues Report Calling for Extreme E-Discovery Makeover

Wednesday, April 28th, 2010

The New York state court looked in the mirror recently and they didn’t like what they saw.  While it’s hard to imagine the self-dubbed “center of the universe” finding flaws with anything… apparently e-discovery has caused the big apple to take serious stock of the situation.  In a report entitled ELECTRONIC DISCOVERY in the NEW YORK STATE COURTS, Chief Judge Jonathan Lippman and Chief Administrative Judge Ann Pfau do an excellent job laying out the nature of the problem in a 24 page report.  Their initial findings in many ways mirror those of the American College of Trial Lawyers Task Force on Discovery (”Task Force”) and their survey of the Fellows of the American College of Trial Lawyers (”ACTL”).

“Electronic discovery (“e-discovery”) has for some time been changing the face of modern litigation. It is a major, if not the predominant, factor behind rising litigation costs and delays and presents serious challenges to the court system’s ability to resolve disputes ranging from commercial matters to personal injury cases, in an efficient, cost-effective manner.”

Fortunately, the Report recognizes the ubiquity of the vexing e-discovery challenges.

“[T]he volume of electronically stored information (“ESI”) has increased exponentially over the last decade, along with the amount of ESI potentially relevant to legal disputes. But while it is inexpensive to store immense quantities of ESI, it can be extremely expensive in the context of litigation to identify, preserve, and collect potentially relevant ESI and to have it reviewed for responsiveness and privilege by attorneys and paralegals prior to production to another party.”

But surprisingly, they’ve taken their shortcomings personally, and the seriousness apparently threatens New York’s standing in the legal community.

“Interviews with leading judges, law clerks, and practicing lawyers from around the state strongly suggest that the New York court system’s standing as a leading forum of both national and international litigation is at stake. … Those same parties and lawyers appear to be turning away from New York State courts for the greater sense of certainty and ability to handle massive e-discovery disputes that the Federal courts, and to a lesser extent, other state courts with more developed e-discovery practices, can provide.”

The report founded upon “extensive research and interviews with experts in electronic discovery”, addresses the problems of electronic discovery, including cost and delay, and provides several recommendations on how “the courts can manage e-discovery in a more expert, efficient and cost-effective manner within the framework of existing law.”

1. Establish an E-Discovery Working Group

This proposed step is one of the more interesting since the goal is to create “a working group of e-discovery experts that would serve as a resource for the court system and support its efforts to improve the management of e-discovery.”  This Working Group would have a very expansive (perhaps too much so) roster:

  • Judges, court attorneys, and court clerks drawn from both the Commercial Division and other courts around the state that handle electronic discovery issues (and perhaps one or more judges/court personnel with little or no e-discovery experience);
  • Lawyers with extensive experience litigating cases involving large volumes of ESI;
  • One or more CPLR Advisory Committee members with an electronic discovery background;
  • Medical malpractice, matrimonial, criminal, mass tort, and employment law practitioners, because of the increasing frequency and importance of electronic discovery in these practice areas;
  • General counsel familiar with the issues affecting corporate clients who are heavy-ESI producers, particularly in the financial services and health care industries;
  • Forensic computer/e-discovery specialists who typically are hired for large electronic discovery productions, but can share their substantive technical knowledge and familiarity with the latest technological/forensic trends;
  • A mix of newer and more experienced practitioners, including one or two more experienced practitioners with limited technical proficiency;
  • Bar association representatives who have studied and issued reports on electronic discovery;
  • Federal practitioners and/or federal magistrates to offer the federal courts’ perspective;
  • An academic who has studied and written about electronic discovery;
  • Representatives of the Advisory Group to the New York State and Federal Judicial Council, which works to promote awareness about differences and commonalities in law practice between the state and federal judiciaries;
  • A member of The Sedona Conference®, a national group of jurists, lawyers, experts and academics considered to be at the cutting edge of electronic discovery issues;
  • Representatives of the Attorney General’s and/or District Attorneys’ Offices who are familiar with how electronic discovery is affecting their caseloads.

Assuming they can put together this dream team, the next challenge (beyond finding times to meet) would be to harmonize all the differing perspective, which certainly won’t be easy.

2. Improve the Preliminary Conference

The Preliminary conference was roundly felt to have value, but there were both short term and long term recommendations for change.  In the near term, the Report concludes that new language should be added to Commercial Division Uniform Rule 1 and to Rule 202.12(c)(3) adding in a new language stating that:

“Counsel appearing at the PC should be sufficiently versed in matters relating to their client’s technological systems to competently discuss with the court and opposing counsel all issues relating to e-discovery. Counsel may, in appropriate cases, supplement their ability to address these issues at the PC by bringing a client representative or outside expert with such knowledge.”

Assuming the short term fixes don’t remediate things completely, the Report recommends two additional steps, each to be piloted.  First, one pilot project should require an Initial Disclosure (similar FRCP Procedure 26[a][1]) for all parties relating to electronic discovery issues, which would require the parties to detail the following, in advance of the PC:

• Who the party’s key IT people are;

• Whether, and to what extent, the party has implemented preservation measures to avoid spoliation of the information relevant to this case;

• Which substantive witnesses the party is likely to call who are likely to possess ESI, and the location of that ESI (e.g., laptops, wireless handheld devices);

• What types of computer systems (including e-mail, word processing and spreadsheet software) and other technologies the party uses that may have created documents relevant to the litigation; and

• Whether the party expects to claim that certain ESI relevant to the case is inaccessible due to the form in which it is maintained (e.g., disaster recovery backup tapes, legacy data).

The other pilot program would require an “Affirmation of E-Discovery Compliance” that would be jointly signed and certified by the lawyers for each party, and provide the court with three lists.

“The first list would contain those e-discovery matters, contained in Rule 8(b) or Rule 202.12(c)(3), which the parties were able to meet-and-confer about and resolve. The second list would contain similar matters that, despite meeting and conferring, the parties could not agree upon or resolve and that need the court’s involvement. The third list would be any additional issues that, because of the disagreements described in the second list, the parties could not yet reach and resolve. The document would also chronicle the parties’ attempts to meet-and-confer, and indicate whether, and to what extent, client personnel and IT specialists were involved.

While there are a few other minor suggestions, one of the most interesting is the shout out to the The Sedona Conference®.  The Report concludes that “judges and practitioners applauded the work of The Sedona Conference®, particularly its emphasis on changing the litigation culture and fostering dialogue, cooperation, and transparency in e-discovery.”  The Report recommends an appointment of a representative to The Sedona Conference® which despite the foregoing “should not be interpreted to mean that the court system necessarily endorses that organization’s work and proposals. Rather, the court system’s appointee would bring back materials for consideration here in New York, to be accepted, rejected, or modified, as appropriate.”

All in all, the New York state court appears to have taken a reasoned and measured approach to address their candid shortcomings.  This type of critical analysis should be taken by more jurisdictions to determine where process gaps still exist.  Only then can a better future state be divined.

Learn More On Litigation Support Software & Electronic Discovery Litigation

7th Circuit Launches an Electronic Discovery Pilot Program

Thursday, October 15th, 2009

Recently, I attended the Sedona Conference’s annual meeting in Atlanta and, amongst other interesting topics, was the discussion of local rules developments and in particular the Seventh Circuit’s new Electronic Discovery Pilot Program (“Pilot Program”).  The Pilot Program was launched October 1, 2009 and seems to be a model for collaboration, since it was developed by eliciting input from a number of disparate groups:

“(a) continuing comments by business leaders and practicing attorneys, regarding the need for reform of the civil justice pretrial discovery process in the United States, (b) the release of the March 11, 2009 Final Report on the Joint Project of the American College of Trial Lawyers Task Force on Discovery (“Task Force”) and the Institute for the advancement of the American Legal System at the University of Denver (“IAALS”), and (c) The Sedona Conference® Cooperation Proclamation.”

The impetus of the Pilot Program was the “broken” nature of the electronic discovery process with the belief that better collaboration and cooperation would certain help remediate the situation.

“The goal of the Principles is to incentivize early and informal information exchange on commonly encountered issues relating to evidence preservation and discovery, paper and electronic, as required by Rule 26(f)(2). Too often these exchanges begin with unhelpful demands for the preservation of all data, which often are followed by exhaustive lists of types of storage devices. Such generic demands lead to generic objections that similarly fail to identify specific issues concerning evidence preservation and discovery that could productively be discussed and resolved early in the case by agreement or order of the court. As a result, the parties often fail to focus on identifying specific sources of evidence that are likely to be sought in discovery but that may be problematic or unduly burdensome or costly to preserve or produce.”

What I really like about the Pilot Program is that it strives to be both prescriptive and practical, which should hopefully avoid the type of ambiguity often exploited by obstreperous counsel.  For example, there is an entire section on early case assessment (ECA) principles, which require discussion of:

  • Production issues
  • Identification of electronically stored information (ESI)
  • The scope of preservation
  • The meet & confer process

There’s also the relatively novel requirement that counsel designate an e-discovery “liaison” to work with the parties to coordinate and flesh out germane e-discovery issues.  Regardless of whether the e-discovery liaison is an attorney, a third party consultant, or an employee of the party, the e-discovery liaison(s) must:

“(a) be prepared to participate in e-discovery dispute resolution;

(b) be knowledgeable about the party’s e-discovery efforts;

(c) be, or have reasonable access to those who are, familiar with the party’s electronic systems and capabilities in order to explain those systems and answer relevant questions; and

(d) be, or have reasonable access to those who are, knowledgeable about the technical aspects of e-discovery, including electronic document storage, organization, and format issues, and relevant information retrieval technology, including search methodology.”

Needless to say, this requirement alone should make marked improvements in the e-discovery dialogue, which unfortunately seems like it’s occurring (literally) among participants who both speak different languages and don’t realize it.

Finally, what makes the Pilot Program unique is that its Principles will be subjected to testing during the phases of the Pilot Program, which is scheduled to end on May 1, 2010 (for the first phase).

This project certainly seems like it’s on the right track and pending feedback from the bench and bar, it could serve as a model for local jurisdiction everywhere.

Learn More On Frcp Electronic Discovery.

As the Electronic Discovery World Zurns

Wednesday, July 29th, 2009

Judge Grimm’s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.

“[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

Despite, legions of articles and blogs on the topic, at least certain portions of the bench haven’t taken heed.  In the case In re: Zurn Pex Plumbing Products Liability Litigation, 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter “Zurn“), U.S. District Judge Ann Montgomery receives points for understanding some basic e-discovery tenants around recall and precision, but then mysteriously goes where “angels fear to tread” by suggesting her own search terms.

Examining the case facts in more detail,…  Zurn is a class action products liability case where discovery was bifurcated (as is often the case – see Spieker v. Quest Cherokee) to first cover the class “certification” component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that “while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.”  However, the Magistrate didn’t shut the door entirely, suggesting that “should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.”

Despite complying with Sedona’s Cooperation Proclamation (“The parties have worked amicably throughout the discovery process”) opposing counsel still got to loggerheads when plaintiff found “voids” in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.

Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff’s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared “J” and “K” drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required “approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.”  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could’ve added another $400,000 for the project.

Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney’s representations about the resulting burden:

“It is unclear whether Zurn’s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians’ emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.”

The 361 gigabytes apparently resulted from “hits” corresponding to plaintiff’s 26 search terms.  The court correctly identified that those terms had precision issues (“many of Plaintiffs’ proposed search terms will likely produce a large number of ‘hits’ that have limited relevance in the case.”)

Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm’s warning and surprisingly took matters into her own hands: “the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.”  Here is the Judge’s list of keywords:

(1) AADFW,
(2) Corrosion,
(3) Corrosive,
(4) Corrosive Water,
(5) Crack,
(6) De-zinc,
(7) Dezincification,
(8) DZR,
(9) Fail,
(10) IMR,
(11) Leak,
(12) MES,
(13) SCC,
(14) Stress corrosion cracking

Without looking at the underlying data, it’s clear from the outset that Judge Montgomery didn’t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could’ve been captured by a single stemmed search using the term “corros*.” Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would’ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.

Judge Montgomery did recognize the potential error of her ways and gave the parties an out:

“The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.”

This “specific evidence” language seems to track notions from Sedona’s search best practices protocol, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless blindly proffer the 14 term search strategy.  Instead, she should’ve quoted Victor Stanley and required the parties to come up with a data driven approach that met requisite precision and recall metrics.