Posts Tagged ‘EDRM’

Clearwell Doubles Down on Review

Monday, August 22nd, 2011


(Editor’s note: This special guest post was written by Chitran
g Shah, Clearwell Principal Product Manager. He is an RIT alum and avid hiker who works with our engineering team and lead customers to optimize the product for large-scale review. – Kurt)

As we’ve previously shared, our product strategy throughout 2009 and 2010 was to expand the product footprint across the EDRM as customers were demanding a single, end-to-end eDiscovery product. During this period we successfully expanded from our roots in processing, search and analysis to review and production (August 2009), identification and collection (September 2010) and legal hold workflow (March 2011). Over the last several months, our focus has been to go deep in each of these modules and provide features that deliver even greater return on investment to our customers.

Today, I am excited to announce significant new features and feature enhancements to the Clearwell Review and Production Module and say a few words about what motivated us to build these features and how they enable our customers to further streamline their legal review workflow.

There are several exciting features in this release, but I would to like to highlight three in particular:

1. Ability to seamlessly import production load files

Most matters require reviewing relevant documents alongside the documents received from third parties, opposing parties, and even previous litigations. With the new load file import feature, users can now streamline the process of importing load files with three simple steps.

In Step 1, a step-by-step wizard-like interface guides users though the selection of formatting information such as field delimiters and nested value delimiters, metadata information such as bates numbers, family relationships, tags, folders and any number of custom attributes, and content information such as images, extracted text and native files. When the load file has both extracted texts and native files, the wizard gives users an option to specify which content should be used for searching.

In Step 2, the system performs a deep validation of the load file and generates a report documenting any inconsistencies such as missing bates numbers or missing values for required fields found in the load file. As a result, customers have the ability to quickly find and fix any issues with the load file before the import begins.

In Step 3, the system imports the documents and builds analytics. Once this step completes, the imported documents, including all metadata and content, are available for viewing and searching.

All the analytics capabilities customers are familiar with, such as discussion threads and concept search, are also available for documents imported from load files. This allows users to quickly discover documents in the load file that are conceptually similar to natively processed documents, for example.

2. Support for large scale reviews and productions

As the volume of electronically stored information (ESI) continues to grow, our customers find themselves reviewing and exporting more and more documents, and they need a solution that can cope with the massive growth in data. At the same time, they don’t want to spend large sums of money building a server farm in anticipation of the growth. They want the flexibility to add capacity when needed and remove it when not needed.

Clearwell’s scale-out architecture enables administrators to easily add appliances and allocate them to a particular matter and to a specific task using a point-and-click interface.

For example, if an administrator needs to increase the number of reviewers from 200 to 400 in order to meet a tight deadline, he or she can easily add 2 appliances to the cluster and assign them for review. Once the review completes, the administrator can now easily re-assign these appliances for production, allowing users to easily meet deadlines while reducing their overall hardware costs.

This flexibility allows our customers to maximize the use of their hardware resources while providing infinite review, export and production scalability.

3. Streamlined management of exports and productions

Clearwell provides powerful export options, and while our customers use them extensively for creating a variety of different production formats, they typically standardize on a few. Clearwell’s new case export and production templates provide a quick and easy way for case administrators to define the export format once and use it across multiple cases. When exporting documents, users can simply select a template from the list of visible templates in that case. This capability significantly reduces the overhead associated with managing export formats and allows our customers to produce documents in a consistent format across multiple matters.

Additionally, new production pre-mediation reports automatically identify problem documents and group them by issue type for quick resolution. This enables users to preemptively identify and resolve document production issues without delaying entire productions.

Says Wendy Butler Curtis, chair of Orrick, Herrington & Sutcliffe’s eDiscovery Working Group, “Legal review is one of the most challenging phases of the eDiscovery process. As electronic data volumes continue to grow, it is increasingly important to leverage technologies that can streamline and improve legal review, ensure defensibility and reduce costs. Solutions like the Clearwell eDiscovery Platform enable legal teams to create an iterative eDiscovery workflow that allows for more efficient and effective large-scale review.”

We will be showcasing the new features at ILTA (Booth 816) this week in Nashville, so come see us and let us know what you think.

(Chitrang Shah is a Principal Product Manager at Clearwell Systems, now a part of Symantec, and the lead Product Manager for Clearwell’s Processing & Analysis and Review & Production Modules)

Gibson Dunn’s Mid-Year eDiscovery Report Highlights Changes in Sanctions Landscape

Monday, August 15th, 2011

In past years we’ve covered Gibson Dunn’s Mid-Year E-Discovery Report which is always a good read, chock full of take-aways about the eDiscovery market.  In my mind, they do an excellent job of synthesizing the ever-expanding volume of case law and comparing those trends with historical averages.  This year’s report is no exception, and for those who don’t get to read all the cases, this is a stellar way to keep up on eDiscovery trends.  Without trying to summarize the entire 23 page document, there were a number of findings that stood out and should be perused by anyone with even a passing interest in the space.

Legal Holds/Preservation. As we all know, eDiscovery sanctions (at least here in the US) are critical business/legal drivers, particularly with regard to the legal hold area (which is the riskiest part of the EDRM).  As the Gibson report points out, the actual award of sanctions has remained relatively flat (56% in the first half of 2011 versus 55% for the full year in 2010) –  but, more important than this relatively stable metric, it’s very clear that the plaintiff’s bar has caught on to the ability to win cases by revealing shoddy (or just undocumented) legal hold procedures, even in some instances where data isn’t lost.  This is why the report notes a dramatic increase in the seeking of eDiscovery sanctions – 68 at mid-year 2011 versus 31 at mid-year 2010.  This doubling of attempts to pierce an entity’s legal hold regime should be a wake-up call to in-house practitioners and chief legal officers, since the attempt and success rates will likely only increase over time.

While there is still some considerable debate, at least for those following Judge Scheindlin’s Pension Committee logic, anything less than a formal, written legal hold policy is per se negligent.  Although it’s conceivable that  a reviewing court won’t use this rigorous standard, anything less formal will strike most organizations as simply too risky.  Ongoing compliance with the legal hold process is also another difficult task for many organizations, one which is considerably easier with an automated solution that is able to track acknowledgements and send reminders over time.  It’s all too easy for companies to think that once they’ve discharged their initial legal hold duty they’re in the clear – but as these obligations morph (with more custodians/data types) and elongate (from months to years) over time, keeping on top of the legal hold processes becomes that much more important.

Sanctions. The Gibson report also importantly points out that there’s currently a split in jurisdictions where some courts can levy sanctions for bad faith, while others can merely require proof of negligence.  Here, the important take-away is that a defendant entity doesn’t typically get to forum shop and therefore they can’t really tell which type of jurisdiction they’ll end up in as a litigant.  So, they need to build their eDiscovery processes to meet the high water (i.e., most rigorous) standard.  In most cases, it’s therefore prudent to be prepared to be sanctioned for merely negligent conduct – anything less can potentially be safe but that risk calculation needs to be considered carefully.

The other perilous part of the equation is that once sanctions are deemed warranted, the court has almost unlimited discretion to levy whatever blend of sanctions it thinks is appropriate.  In Green v. Blitz, for example, the court ordered a laundry list of sanctions, some of which were pretty unfathomable:

1. Defendant had to pay plaintiff $250,000

2. Defendant had to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years

3. Defendant had to file the court’s order in every case that it is involved in for the next 5 years

The bottom line is that sanctions, despite the fear factor, can be used to drive positive proactive conduct – namely in the shape of eDiscovery best practices.

Outside Counsel Duties. Here, the Gibson report notes that outside counsel’s Zubulake duties continue to increase over time, with a number of cases continuing the trend of holding attorneys responsible for ensuring that their clients properly implement legal holds, institute sound sampling protocols and conduct sufficient quality control steps.  This line of discussion can be useful when talking to outside counsel where we’re starting to see how their increasing responsibilities can lead to malpractice exposure, as seen in the recent McDermott case.

Search/Analysis. Lately there’s been a ton of buzz about predictive coding, but (despite the hype) it still doesn’t appear ready for prime time yet.  The Gibson report noted that there were no reported cases that addressed the use of predictive coding or other advanced search technologies.  My sense is that without some semblance of judicial approval or strong client backing, outside counsel (who are concerned about their malpractice exposure, per above) aren’t quickly going to be the first ones into the pool.  Unless an enterprise client demands that they use this type of technology, most will wait for judicial approval and that’s probably still a way off.  While next generation search technologies are more promise than reality right now, there is still a mandate to implement a defensible search methodology.  These are needed initially to demonstrate transparency in the eDiscovery process and to then withstand the challenges levied by counsel in the case of an inadvertent production.

In sum, the Gibson report shows the ongoing maturation of the eDiscovery space.  But, any niche market led by case law and/or attorneys deciding to adopt new technologies won’t be quick to change.  In many instances, therefore, the best practices will be decided a combination of standards bodies and vendors who are being pushed by their more forward thinking clients to get and stay on the cutting edge.

Apple, Code Name K48 and E-Discovery

Wednesday, June 22nd, 2011

According to a complaint filed by the U.S. government, the FBI secretly recorded an employee at one of Apple’s suppliers passing confidential information about the soon to be released Apple iPad in an October, 2009 telephone conversation.  The recording, along with other evidence, led to the arrest of the employee and others on charges on of wire fraud and conspiracy to commit securities fraud on December 16, 2010 as part of a major insider-trading investigation.  In the conversation, a director for Flextronics named Walter Shimoon is heard saying:

“they [Apple] have a code name for something new … It’s … It’s totally … It’s a new category altogether… It doesn’t have a camera, what I figured out. So I speculated that it’s probably a reader. … Something like that. Um, let me tell you, it’s a very secretive program … It’s called K, K48. That’s the internal name. So, you can get, at Apple you can get fired for saying K48.”

Four months later, the first Apple iPad, code named K48, was unveiled to the public.    To read more about the case background, read the press release issued by the U.S. Attorneys’ Office on December 16, 2010.

The case is interesting from an eDiscovery standpoint because it highlights challenges related to finding critical evidence as part of an investigation or lawsuit when people are intentionally using code words to hide information.  Finding or overlooking important documents that have been disguised can make or break your case, so determining whether or not key players are using code words is an important part of a thorough investigation.  Equally important to the investigation is segregating relevant and irrelevant documents quickly before key evidence is lost or destroyed without being required to conduct a painstaking page by page review of each document.

How Does Technology Help?

The good news is that even though technology innovation has resulted in massive data growth requiring the review and analysis of more documentary evidence during lawsuits and investigations, advances in eDiscovery technology have also made sifting through this information faster and easier.  In other words, technology can help solve the data growth problem technology created.

One of the newest advances is the use of “transparent concept search” technology to find important electronic files in lieu of basic “keyword” or “traditional” concept searching technology.  In many situations investigators or lawyers simply aren’t aware code words are being used to hide activity, so critical evidence is often overlooked.  For example, in the present case assume the investigator is unaware that “K48” is the internal code name used for the first iPad.  A simple keyword search for the term “iPad” may not retrieve critical documents about the “iPad” because the code name K48 is being used to disguise the product name.  If this is the only search methodology used, information could easily be overlooked during the investigation due to the limitations of simple keyword search technology.

On the other hand, running the same search using a traditional concept searching tool is likely to retrieve documents containing the word “iPad” as well as other conceptually related documents.  The problem is that the user has no ability to control the breadth of the search using traditional concept searching technology.  That means even though a traditional concept search for the term “iPad” is likely to include documents containing the term “K48” and “iPad,” it is also likely to retrieve a large number of irrelevant documents containing terms like “iPod, iTouch and iTunes that may appear to be conceptually related to the search term “iPad.”  The problem may seem trivial initially, but when investigators are required to read hundreds or thousands of irrelevant documents about the iPod, iTouch or iTunes in an effort to find relevant documents about the iPad, the time and cost of the investigation can skyrocket.

Next Generation Transparent Concept Search Technology

To solve this problem, next generation transparent concept search technology takes traditional concept searching a step further by empowering investigators to reap the advantages of traditional concept searching while actually reducing instead of increasing e-discovery expenses.  The secret is that transparent concept searching technology significantly reduces the time and expense resulting from over-inclusive document retrieval by allowing users to eliminate documents containing concepts that are not relevant to the intended search.  This is accomplished by providing a transparent view of concepts related to a search so that users can actually visualize and select (or deselect) the range of concepts to be included in a search before the search is executed.

For example, using transparent concept search technology to search for the term “iPad” would reveal conceptually related terms like “K48” just like traditional concept searching.  However, a transparent concept search would also provide a list of all concepts related to the keyword “iPad” prior to the search such as “K48, iPod, iTouch, Shimoon, iTunes, etc.  Prior to executing the search, the user could de-select irrelevant concepts and limit the search to “iPad”, “Shimoon”, “internal” and “K48” to make sure only the most relevant documents are retrieved. (See Figure 1).  In addition to decreasing the cost associated with segregating relevant and irrelevant documents, the transparent approach to concept searching results in strategic advantages for investigators and legal teams because the most relevant evidence is found quickly so cases can be assessed faster, with more accuracy, and before evidence disappears.

Figure 1: Transparent concept search reveals all concepts related to the keyword “iPad” so users can not only identify key documents they may have otherwise overlooked, but they can also select which concepts (“internal” “K48” “Shimoon”) to include in the search so only the most relevant documents are retrieved.

Conclusion

Not knowing what to search for as part of eDiscovery or investigations is often the biggest organizational challenge that basic keyword and traditional concept search technology has not been able to solve.  Next generation transparent concept search technology overcomes the inherent limitations of basic keyword and traditional concept searching technology by empowering users to uncover, assess, and review evidence faster and with more accuracy, thereby giving litigators or investigators new strategic advantages on every case.

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

Electronic Discovery Cases You Must Know

Tuesday, May 10th, 2011

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

  • to issue a written litigation hold;
  • to identify all of the key players and to ensure that their electronic and paper records are preserved;
  • to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;
  • and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent production). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

· to issue a written litigation hold;

· to identify all of the key players and to ensure that their electronic and paper records are preserved;

· to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;

· and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent p

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

  • to issue a written litigation hold;
  • to identify all of the key players and to ensure that their electronic and paper records are preserved;
  • to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;
  • and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent production). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

roduction). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

The Story Behind Clearwell’s New Litigation Hold Module

Wednesday, March 16th, 2011

The amazing thing about the litigation hold process is that everyone is doing it, but the vast majority of people are still doing it manually.

Every company in every case has a duty to preserve, and the only way to meet that obligation is to send out litigation hold notices, track responses, and monitor compliance. To help companies do this, software vendors like PSS (now part of IBM) and Exterro have had products on the market for years. But, despite being good applications with several happy customers, they have only been adopted by about 100-200 customers in aggregate, which is a tiny fraction of the thousands of companies struggling to manage the litigation hold process. Why the low penetration rate?

Well, it turns out there are several good reasons. First and foremost, these applications are expensive, and usually cost over $250K in software licenses, hardware, and implementation services. Second, they take a long time to deploy, often requiring services engagements lasting more than 6 months. Finally, they are pure workflow solutions which are disconnected from the data. That makes it hard to keep them up to date, and means you cannot use them for later stages of the e-discovery process such as collecting data or then processing it. So the audit trail, which is important for defensibility, is often incomplete, and there’s the real risk of “disconnects” between different phases of the e-discovery process.

At Clearwell, we find this type of situation – where there’s a clear market need that’s unaddressed by existing solutions – absolutely fascinating. It led us to ask: what if there was an inexpensive litigation hold solution that’s easy to deploy AND is tightly integrated with identification, collection, processing, ECA, and review? For the past few months, we’ve been working closely with a large number of our customers to answer that question. Our goal was to design a product that will meet the needs of the mass market, which today is still using spreadsheets to track its litigation holds.

The result is Clearwell’s new Litigation Hold Module, which we announced on Monday and is available this month. It brings Clearwell’s trademark ease-of-use and quick time-to-value to the preservation stage of EDRM, and enables customers to manage all their cases from cradle (preservation) to grave (production) within a single product. From the initial conversations with customers, the response has been incredibly positive. Prior to the product shipping, it has been purchased by CA Technologies, Exterran, Flowserve Corporation, and several others, with many more evaluating it for purchase this month. As with all our products, we offer free evaluations and I encourage anyone responsible for managing the litigation hold process to give it a try.

For those who are keeping count, this is Clearwell’s fourth module, all built as a single integrated product. We first came to market in 2006 with processing and ECA. In 2009, we expanded to the right of the EDRM model by adding a module for review and production. The following year, in 2010, we moved left (in EDRM terms) by releasing a module for identification and collection. Now in 2011, this new module for preservation and managing the litigation hold process completes the picture. It makes Clearwell the only fully integrated, end-to-end e-discovery product suite, since other vendors either offer a narrower product footprint or have cobbled together disparate products via acquisition.

Coming on the heels of Transparent Concept Search, this is our second major product announcement of the year – and there will be more to come. The product development team is bursting with new ideas, and we have a rich pipeline of new technologies and products slated for release in the coming months.

Clearwell Streamlines the Legal Hold Process with the New Clearwell Legal Hold Module

Monday, March 14th, 2011

(Editor’s note: This special guest post was written by Teddy Cha, Clearwell Senior Product Manager, MIT alum, and coffee connoisseur. Teddy was a key member of the team that developed our Legal Hold Module and has worked tirelessly with our engineering team and lead customers to bring the product to market. – Kurt)

Legal hold is a critical first step to any e-discovery process, but as recent experience has shown, enterprises are still struggling to perform them in a defensible and repeatable way. A judicial warning was heard as early as 2003 with Judge Sheindlin’s ruling in Zubulake v. UBS (and most recently in Pension Committee).  The need for change is not coming from only a single judge, however.  In 2010, the Duke Law Journal studied the level of sanctions compared to previous years and found that:

  1. Sanctions are at an all-time high (up 271% since 2005)
  2. Damages were as high as almost $9 million
  3. The most common misconduct was the failure to preserve data

Sending legal hold notices can start out simple, but it can quickly become unwieldy if not managed correctly. It’s like taxes. Everybody has to do them, and it typically starts out as a “simple” process. But as your assets grow, you may want to invest in more complex software or an online service to maintain efficiency. And once you start a family (or a small business), you’ll need to graduate to a much more robust process.

As companies grow their legal hold process evolves in the same way. Their progression can be described in the following distinct three stages:

Stage 1: Manual Legal Hold Process

Sending a litigation hold notification is as easy as…well, sending an email. But tracking these litigation matters and their responses in spreadsheets quickly grows out of hand once a poor paralegal has to manage a 10th, 20th and 50th simultaneous legal matter (or even multiple holds in a single case).  This manual process is difficult to repeat, error-prone, and likely doesn’t reflect the real-time status of compliance the second the spreadsheet is saved. Typical corporations are concurrently managing hundreds active legal holds, involving thousands of custodians, across multiple business units and groups. It becomes quickly apparent that a better solution is required.

Stage 2: Stand-Alone Legal Hold Software

Legal Hold solutions have been in the marketplace for a number of years. Typically they fall into two categories:

  1. Matter Management or Information Governance systems that help enterprises construct workflows and integrate record management policies and controls. Legal Hold notification capabilities are an appended component to these ambitious and holistic solutions. These systems are typically expensive and have long implementation cycles.
  2. Narrowly focused offerings aimed at managing just legal hold notification and survey tracking. These solutions typically cost less than the above and are delivered as a hosted service (SaaS).

Stand-alone legal hold software products are certainly an improvement on the Stage 1 manual process. But despite virtually all major enterprises needing some sort of legal hold process, they have not yet raced to embrace these Stage 2 solutions yet. Why not?

Following a typical e-discovery case quickly uncovers the problem. Sending and tracking legal holds is a necessary part of the e-discovery process, but it is only the first step. Soon after custodians are notified of their obligation, e-discovery teams must separately collect, process, analyze, review, and produce that data using other solutions. Stage 2 legal hold solutions are stuck just managing the holds.

This is where purchasing a stand-alone legal hold solution is a bit like buying an iPhone without the network plan: You can’t really do much with it (well, you could play Angry Birds, but only if you download it over a WiFi connection). You can’t obtain your goal of mobile communication without a phone and a network plan.

Stage 3: Integrated Legal Hold Software

To address to drawbacks of Stage 2, many companies today are looking for a more integrated approach – one that marries legal hold with the rest of the e-discovery process. This is where Clearwell’s new solution can help. Once custodians have acknowledged the legal hold notice, Clearwell can immediately reach across the enterprise network and collect those custodians’ data. Once the data is collected, a few clicks of the mouse prepare it for early case assessment (ECA), analysis, and review.

As any experienced corporate IT and legal executive will tell you, such a comprehensive solution has long been promised, but has not come with fast implementation (i.e., up and running in a day), ease of use (i.e., no training required), or in a single platform  (i.e., one login for users and no exporting or importing of data between e-discovery phases). With this in mind, we are delighted to announce the Clearwell Legal Hold Module, now available as part of the Clearwell E-Discovery Platform. Combined with Clearwell’s Identification & Collection, Processing & Analysis, and Review & Production modules, companies can now leverage a truly integrated e-discovery solution to lower the cost and risks of e-discovery. Key features of the new Module include:

  • Hold Notices: Hold notices can be quickly created and sent to relevant custodians and system administrators via email. Different notices can be sent to custodians and system administrators, streamlining the notification process. Notices can be sent immediately or scheduled for delivery.
  • Auto-Reminders and Auto-Escalations:  Reminders and escalation notices can be scheduled for delivery to non-responsive custodians, eliminating the need for manual follow-up.
  • Custodian Surveys: Surveys containing single-choice, multiple-choice, or free form text questions can be created and issued to key custodians so administrators can easily capture information critical to a case, thereby expediting the interview process. Surveys can also be saved as templates to the Notice Library and reused.
  • Automated Tracking and Reporting: Administrators have immediate visibility into the status of all legal hold notices across all cases through a single pane of glass. Administrators can drill-down by case to view the status across all custodians, including those who have received and responded to their hold notices, and those who haven’t.

Until today, corporations have been making do with manual or stand-alone legal hold solutions that are neither scalable nor integrated with the rest of the e-discovery process, assuming more and more risk and incurring greater costs – never an ideal combination. Fortunately, it no longer needs to be that way.

(Teddy Cha is a Senior Product Manager at Clearwell Systems and the lead Product Manager for Clearwell’s Legal Hold and Identification & Collection Modules.)

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

Moody v. Turner: An E-Discovery Battle with No Winners

Friday, December 3rd, 2010

The electronic discovery blogosphere is filled with analysis of the recent opinion by Judge Sandra Beckwith of U.S. District Court for the Southern District of Ohio, on the Moody v. Turner case. What is striking about the case is that it reveals a huge gap in understanding the pitfalls of prolonged discovery disputes in the context of attempts by thought leaders to prevent exactly the issues elicited in this opinion. As the excellent post by Ralph Losey indicates, in this case, it is an affront to have this play out in front of Judge Beckwith, a signatory to The Sedona Conference Cooperation Proclamation.

In reviewing the facts of the case, here are highlights on some of the process missteps:

Lack of Early Data Analysis

It is not obvious to some how important it is to perform an early analysis of the data before agreeing to search  ESI for a certain number of custodians and apply certain keywords. This case illustrates three reasons why early data analysis is critically important .

First, the producing party must identify and communicate the right list of custodians. If there is any change or expansion of scope, that needs to be communicated as well. In this case, the Defense team, at their pre-trial 26(f) conference with the Plaintiffs, agreed to produce ESI for twenty six custodians, but chose to send Preservation Notices to larger number of individuals.  While this act by itself is commendable, the lack of prompt communication to the Plaintiffs is certainly a misstep that the Plaintiff chose to latch on to as incomplete production of ESI.

Second, the producing party must have a handle on scope of searches before committing to “run them”.  In reviewing the document Case: 1:07-cv-00692-SSB Doc #: 43, Exhibit 7, it is apparent that the twenty production requests in that report are not trivial. An early analysis of both the data as well as searches at least on a small sample would have helped the producing party understand the scope and challenges of running those searches.

Third, the producing party must evaluate their collection, search, and production methods to evaluate the feasibility of producing metadata. As evidenced in the Plaintiffs’ motion (Doc-89, Page 19), it is clear that the Defense did not produce TIF images along with searchable text. However as noted in Doc-118, Page 18, footnote 10):

“In any event, parties are generally not required to produce the metadata of their data sets. See Wyeth v. Impax Labs., Inc., No. 06-222, 2006 WL 3091331 at *2… Turner has produced all ESI in TIFF format, except for Excel spreadsheets which were produced in native format given the substantial size of many of the spreadsheets (which, if in TIFF format, may print across hundreds of pages). Judge Hogan therefore rightfully declined to compel Turner to produce any additional metadata.”

This is a fairly common request and  one that the Plaintiffs could have placed in their pre-trial 26(f) conference.

Out of Control Production Requests

In reviewing the aforementioned court document, Doc #: 43, Exhibit 7, one can glean a wealth of information on the nature of searches requested by the Plaintiffs and the responses by the Defense team. The immediate problem evident in these requests is an issue raised by the Defense team – that the search requests are overly broad. Some of the search terms are “plan”, “method”, “rate” and “account”, which are certain to hit a very large number of documents. See below for one of the requests.

Production 1-Item 2: All documents other than emails that can be electronically or digitally searched as containing one or more terms that concern the Plan in any way or cash balance pension plans and contain the word “accrual,” “benefit,”, “benefit accrual,” “accrual of benefit”, “accrual methods,” … “calculate”, “calculation”.

This goes on and on, for about eighteen pages. Combined, the twenty production requests would clearly hit almost every collected document (a total of 118GB of documents), thus making a follow-on privilege or confidentiality review prohibitively expensive. It is the lack of specificity in these searches that makes the discovery request overly broad. On the other hand, the response from Defense appears to be also poorly constructed. In their response, what we see is the same boiler-plate text, which didn’t escape the notice of the Plaintiffs and the court.

“Defendants object to this Request because it is overly broad, unduly burdensome, seeks documents that are neither relevant nor likely to lead to the discovery of admissible documents and (because Plaintiffs define “documents” to include electronic or computerized data compilations) seeks electronic documents that are not reasonably accessible due to undue burden and/or cost. Defendants further object to this Request because it implicates documents protected by the attorney-client and/or work-product privilege and any such documents will be withheld from production”

What would have helped the Defense’s case would be actual data supporting their claims. For example, if the defendants were to tabulate that words such as “plan” and “benefit” and provide actual document and/or hit counts, it would have bolstered their claim. As expected, this caused the Plaintiffs to submit a further filing, Doc-89 with a host of complaints, chief among them:

Defendants reported only (1) the total number of unique documents captured by the search of 17 terms and (2) the number of documents that contained the term “cash balance” but none of the Plaintiffs’ other terms. See Doc. 77-10 at 2.

Furthermore, the Plaintiffs appear to be on the right track, recommending:

On October 14, Plaintiffs wrote to Defendants and proposed an “iterative search process” to decide on a final set of search terms.

It seems clear in the on-going discovery disputes, an iterative search process was perceived as contrary to zealous advocacy of their client’s positions and not as a path to resolving further disputes, much as the Cooperation Proclamation suggests. In this context, engaging in a search expert is essential – someone who can modify the search to include more restrictive criteria to limit your search results. Why bother running an open-ended search and produce 29.4GB of useless junk, when you can combine these terms with Boolean, proximity, and other searches? The types of searches, and what each can offer, is a topic that the members of EDRM tackled in formulating their EDRM Search Guide, which is a must-read for anyone attempting to construct e-discovery searches.

Proportionality Arguments Without Strong Basis

An important point to note is that any discovery request that uses inefficient processes and inappropriate technologies will certainly result in undue burdens and cost.  It appears that the Defense team did not offer proper cost estimates (arguments put forth in Doc-77-10 notwithstanding), and just pushed an undue burden/cost argument with the hope that the courts would absolve them of discovery obligations. At the same time, the Plaintiffs did seem to have over-reached a bit on extending their discovery disputes with the hope of reaching a favorable outcome. Two examples of such attempts are:

  1. Upon Defense producing the documents (Doc-118),

Turner has produced every responsive, non-privileged document obtained through the email ESI searches that related to the Plan; these comprise 4.1 GB, or more than 40,708 pages of documents.

The Plaintiffs counter with:

“Plaintiffs maintain that Turner should be compelled to produce the metadata for the email ESI it has produced because otherwise they allegedly “cannot know whether Defendants have searched all 33 custodians’ email files” and “cannot confirm whether any email files were electronic in origin (rather than printouts of emails) or determine whose files they came from.”

As noted earlier, request for metadata and the feasibility of producing it must be negotiated specifically in the 26(f) conference.

  1. The attempt of the Plaintiffs to expand discovery, to compel any and every third party, including Defense’s former law firms, as well as inspect “shared network drives”, “non-shared drives” etc.

“Judge Hogan recognized that Turner should not be compelled to probe through the recesses of its internal electronic systems for even more ESI on top of the 47,000-plus hard copy documents and the 40,000-plus pages of ESI it has produced – because those additional searches are not likely to lead to the discovery of any evidence relevant to plaintiffs’ claims. Judge Hogan was presented with the gory history of Turner’s efforts to search through “shared network drives” and “non-shared drives,” emails and backups. He found these efforts to be sufficient, and rightly rejected plaintiffs’ demand for additional ESI.”

One can see that Plaintiff’s attempt to drag the electronic discovery efforts into an endless battle was counterproductive.

Final Takeaway

The Sedona Conference Cooperation Proclamation rightfully recommends “Jointly developing automated search and retrieval methodologies to cull relevant information”. As costs for getting to the facts escalate, a comprehensive strategy that uses the best processes, the best technology, and a commitment to the Cooperation Proclamation is essential for the legal system to deliver what people expect – justice based on facts. Gamesmanship as evidenced in Moody v. Turner is detrimental to this cause.

Automated Review in Electronic Discovery Re-Visited

Monday, June 28th, 2010

e-discovery Almost two years ago I wrote one of my first blog posts entitled “Review-less E-Discovery Review.”  Despite the tongue twister of a title, the post posited that “there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.” I’d like to take a look and see how much (if at all) my prognostications have materialized.

A cynic might think that this is the moment where E-Discovery 2.0 jumps the shark.  But no, this isn’t one of those sitcom episodes where they flashback to previous shows as an easy way to recycle content.  Instead, it seems useful to see how the legal market has evolved from a litigation workflow perspective, particularly with some vendors touting the benefits of review-less technologies like predictive coding.

In the original blog, I noted that there was a “scenario where a non-manual review methodology may make sense” (while importantly noting that “this approach is not without risk”).  Since my last post there has been the successful adoption of Evidence Rule 502,which makes this methodology (at least conceptually) safer.

But again (imagine dreamy flashback mode), here were the guidelines I previously proffered:

  1. Large data set.  This may sound a bit obvious, but a non-manual approach is best suited for large, unwieldy data sets.  The corpus doesn’t need to be in the terabytes, but the data set should be evaluated in term of discovery processing costs and attorney review estimates.
  2. Short Production Timelines.  Once the above calculations are conducted, the next step is to determine if a human based review could even conceivably be conducted in the given time frame.  In many instances, an eyes-on review process just won’t be feasible since there won’t be enough bodies to throw at the problem.
  3. Next Gen “PAR” Tools.  In order to pull this “review-less” review process off, both safely and quickly, the responding party needs to have access to fast, robust processing, analysis and review (“PAR”) tools.  Certainly, it’s possible to have this scenario work with an e-discovery service provider, if they have the capability.
  4. Relatively Small Amount in Controversy.  For the time being, this approach should not be considered for any “bet the company” litigation, nor anything with significant downside risk (governmental inquiries, punitive damages, class actions, 2nd requests, etc.).  Yet, for many standard commercial lawsuits, corporate investigations, HR claims, etc. this review-less approach may be worth considering.
  5. Ability to Use a Clawback Provision.  Entering into a clawback provision with the opposition is mandatory in this methodology since the chances of an inadvertent production are statistically ever-present.  Yet, until Evidence Rule 502 is resolved, there will always be a risk that the clawback won’t be enforceable against 3rd parties.
  6. Non-governmental Production.  Most information in governmental productions becomes part of the public record, meaning that a clawback isn’t going to be feasible.  Here, trade secret information, personally identifiably data and the like would be disastrous if pushed out into the public domain.

The goal of this post is to see if this dog is any more ready to hunt than it was two years ago.  The short answer (right now) appears to be: No.

We all know that litigators are both risk adverse and generally slow to adopt new technology approaches.  This is particularly true when there’s a perception that they won’t have insight into the technological black box behind automated coding/tagging decisions.  Litigators are understandably sensitive about the ability to prove up the reasonability of their search and review processes.  This “reasonableness” requirement lines up both with the Victor Stanley requirements and FRE 50(b), which eliminates the chance of a waiver only “if the holder of the privilege or work product protection took reasonable precautions to prevent disclosure.”

Given this ongoing hesitancy, the question remains shouldn’t we be seeing more movement in automated review than the glacial progress that’s been achieved to date, particularly with the known shortcomings of the eyes-on review process?  Most are familiar with the 1985 STAIRS study by Blair and Marion where the percentage of relevant documents lawyers thought they had found using Boolean Keyword searches was 75% – when the percentage they actually found was 20%.

But, despite the known deficiencies of eyes-on review it follows into the “go with the devil you know” mindset that often makes sense when dealing with judges and juries who aren’t likely to grok newer-fangled approaches.

In addition to these high-level, almost dogmatic challenges, there is one other tactical element I’d add to my previous list (of 6 factors).

7. All documents processed up-front (no rolling collection). I’ve heard some in the trenches e-discovery experts claim that they’ve never had a case that didn’t involve at least some level of incremental data collections.  Whether this is an overstatement is immaterial.  The fact is that a large number of e-discovery projects involve ESI that is collected (and then processed) in dribs and drabs.  This if often a good thing, largely attributable to the incremental (start slowly) nature of a well thought out e-discovery project where a smaller number of initial custodians are processed, then ECA is conducted and only then is the additional ESI added to the corpus.  This common methodology causes some significant heartburn for a review-less methodology since the ever changing nature of the corpus makes it difficult/impossible for a sample to be truly extensible to what will eventually be the entire data set.  For this reason, the review-less approach should be limited to where the entire corpus is collected and processed at once.

In sum, the seven foregoing factors appear to still be largely valid and create an environment where an automated, review-less methodology will only make sense in a relatively rare set of circumstances.  This may change in the future, but given the risk adverse DNA of most litigators I can’t imagine this tipping point happening any time soon.

Learn More On Litigation Software & Electronic Discovery Litigation