Archive for the ‘in-house e-discovery’ Category

Jumping the Gun? Three Approaches to Drafting New Federal Discovery Rules

Thursday, September 1st, 2011

In my last post I announced that discussions are taking place that could change the way preservation and sanctions issues are handled within the federal court system.  The next round of discussions about possible amendments to the Federal Rules of Civil Procedure (FRCP) is scheduled to take place on September 9th in Dallas, Texas as part of a “mini-conference” led by the Discovery Subcommittee – a committee appointed by the Advisory Committee on Civil Rules.  This post discusses three different rule amendment approaches that attendees have been asked to consider in order to help them prepare for the mini-conference.  A complete list of attendees, preparation materials, and questions the group will consider are included in the Advisory Committee’s June 29, 2011 memorandum to the participants.

The debate about whether or not rule amendments are even required is far from over.  A 452-page document located on the U.S. Courts’ website chronicles many of the meetings, notes, and submissions driving the current discussion.  Page 265 of the document contains a memorandum prepared by the Civil Rules Advisory Committee earlier this year, stating that:

“the Subcommittee has reached no conclusion on whether rule amendments would be a productive way of dealing with preservation/sanctions concerns, much less what amendment proposals would be useful.”

Despite concerns that amending the current rules now would amount to jumping the gun, there is an undeniable desire for more clarity around when the duty to preserve electronically stored information (ESI) is triggered, what must be preserved, and when the duty expires.  This momentum has resulted in the crafting of draft proposals that are likely to help frame the discussion on September 9th. The “proposals” are really draft approaches that have been broken down into three general categories described in the Civil Rules Advisory Committee’s memorandum, titled: “PRESERVATION/SANCTIONS ISSUES” (see page 263).  The Category 1 approach can best be described as providing a higher degree of specificity than the other approaches.  For example, the Category 1 approach provides a fairly detailed explanation of the duty to preserve evidence (Rule 26.1(a)) and details possible triggers (26.1(b)), the scope of the duty to preserve (26.1(c)), and sanctions (Rule 37).  Category 2 proposes a more general preservation rule, while Category 3 only addresses sanctions as a tool for influencing behavior.  The three categories are discussed in more detail below.

Category 1: Specific Rule

This draft includes many different exemplary lists, alternative approaches, and footnotes that highlight the fact that one of the key challenges with drafting a specific rule is trying to foresee all of the challenges that might lie in the road ahead.  For example, the draft rule provides a long list of events that could trigger the duty to preserve evidence, including everything from serving a pleading to taking “any other action” in anticipation of litigation.   The rule also provides a list of information types that are “presumptively excluded” from the preservation duty, such as deleted data on hard drives, temporary internet files, and physically damaged media.

The lists are helpful in that they provide guidance.  However, each list also includes a “catch-all” provision to address scenarios that might not be foreseeable.  The inclusion of catch-all provisions highlights the inherent challenge of providing more clarity and certainty without creating rules that are so inflexible that they are difficult to apply to unforeseen factual scenarios or technological developments.  Some might argue that trying to provide a laundry list of examples will make passage of new rules difficult because each item on the list will stir debate.  Others contend that the lists add little value because the catch-all provisions will still require litigators to pass the sniff test of “reasonableness.”

Despite the inherent challenges related to drafting rules with specificity, most practitioners would likely support the inclusion of lists or examples that provide at least some direction.  What is likely to be far more controversial with respect to Category 1 is the use of alternative language proposing fixed limits around custodians and litigation holds.  For example, one alternative would limit data preservation requirements to a fixed number of custodians and the duty to preserve evidence would similarly expire after a fixed number of years.  Bright line rules like these may be easier to understand, but they also tend to be controversial since they lack the flexibility necessary to fairly address every conceivable situation.

Category 2: General Rule

Like the Category 1 proposal, the Category 2 proposal uses lists and outlines several alternative approaches throughout the rule.  However, the Category 2 proposal fundamentally differs from Category 1 by outlining a more general approach.  For example, one of the alternatives essentially states that the duty to preserve evidence is triggered whenever a “reasonable person” would expect to be a party to an action.  Similarly, the ongoing duty to preserve information after the duty has been triggered would be evaluated based on what is described as a “reasonable period” under the circumstances.

The beauty of this more general approach lies in its simplicity and flexibility.  The idea is that evaluating conduct based on the “reasonableness” of a person’s actions is much easier than attempting to draft bright line legal guidelines that account for every possible factual scenario.  The flip side is that reasonable minds could differ and results could be inconsistent if there are no bright line rules.  What this means in the context of the federal rule discussion is that one judge might find a party’s conduct with respect to data preservation efforts reasonable, while another judge might issue sanctions based on the same set of facts.  In large part, it is this lack of certainty and guidance in the current rules that sparked the current debate in the first place.

Category 3: Sanctions-Based Rule

Unlike the first two categories, the Category 3 approach focuses only on sanctions and would act like more of a “back-end” rule.  In other words, the rule would not contain any specific directives about preservation, but it would provide direction in the areas of when and how sanctions might be applied.

Despite the draconian image a “sanctions” based rule might conjure up, the Category 3 rule may seem surprisingly lenient to some.  For example, absent extraordinary circumstances, the court would be prohibited from imposing any of the sanctions listed in Rule 37(b)(2) or from giving an adverse-inference instruction unless:

“the party’s failure to preserve discoverable information was willful or in bad faith and caused [substantial] prejudice in the litigation.”

The sanctions based approach would almost certainly have an impact on how parties handle upstream preservation related issues.  However, the key ingredients that will impact what kind of behavior this rule drives are the severity of the threatened sanction as well as the applicable standard.  For example, a party facing severe sanctions for conduct that is either negligent, willful or in bad faith is likely to take their preservation obligations seriously.  On the other hand, if the realm of possible sanctions is trivial, parties are less likely to take their preservation related obligations seriously.

Conclusion

The three rule approaches represent very early attempts at framing possible approaches to amending the FRCP.  If the Discovery Subcommittee chooses to recommend rule amendments following the September 9th mini-conference in Dallas, the proposed language is likely to be closer to final form and easier to assess than the current proposals.  I will continue to monitor the rule making discussion and provide commentary in future posts.  Stay tuned for my next post where former US Magistrate Judge Ron Hedges explains why he thinks the rule changes are unnecessary and why the current proposals might run afoul of the Rules Enabling Act.

Gibson Dunn’s Mid-Year eDiscovery Report Highlights Changes in Sanctions Landscape

Monday, August 15th, 2011

In past years we’ve covered Gibson Dunn’s Mid-Year E-Discovery Report which is always a good read, chock full of take-aways about the eDiscovery market.  In my mind, they do an excellent job of synthesizing the ever-expanding volume of case law and comparing those trends with historical averages.  This year’s report is no exception, and for those who don’t get to read all the cases, this is a stellar way to keep up on eDiscovery trends.  Without trying to summarize the entire 23 page document, there were a number of findings that stood out and should be perused by anyone with even a passing interest in the space.

Legal Holds/Preservation. As we all know, eDiscovery sanctions (at least here in the US) are critical business/legal drivers, particularly with regard to the legal hold area (which is the riskiest part of the EDRM).  As the Gibson report points out, the actual award of sanctions has remained relatively flat (56% in the first half of 2011 versus 55% for the full year in 2010) –  but, more important than this relatively stable metric, it’s very clear that the plaintiff’s bar has caught on to the ability to win cases by revealing shoddy (or just undocumented) legal hold procedures, even in some instances where data isn’t lost.  This is why the report notes a dramatic increase in the seeking of eDiscovery sanctions – 68 at mid-year 2011 versus 31 at mid-year 2010.  This doubling of attempts to pierce an entity’s legal hold regime should be a wake-up call to in-house practitioners and chief legal officers, since the attempt and success rates will likely only increase over time.

While there is still some considerable debate, at least for those following Judge Scheindlin’s Pension Committee logic, anything less than a formal, written legal hold policy is per se negligent.  Although it’s conceivable that  a reviewing court won’t use this rigorous standard, anything less formal will strike most organizations as simply too risky.  Ongoing compliance with the legal hold process is also another difficult task for many organizations, one which is considerably easier with an automated solution that is able to track acknowledgements and send reminders over time.  It’s all too easy for companies to think that once they’ve discharged their initial legal hold duty they’re in the clear – but as these obligations morph (with more custodians/data types) and elongate (from months to years) over time, keeping on top of the legal hold processes becomes that much more important.

Sanctions. The Gibson report also importantly points out that there’s currently a split in jurisdictions where some courts can levy sanctions for bad faith, while others can merely require proof of negligence.  Here, the important take-away is that a defendant entity doesn’t typically get to forum shop and therefore they can’t really tell which type of jurisdiction they’ll end up in as a litigant.  So, they need to build their eDiscovery processes to meet the high water (i.e., most rigorous) standard.  In most cases, it’s therefore prudent to be prepared to be sanctioned for merely negligent conduct – anything less can potentially be safe but that risk calculation needs to be considered carefully.

The other perilous part of the equation is that once sanctions are deemed warranted, the court has almost unlimited discretion to levy whatever blend of sanctions it thinks is appropriate.  In Green v. Blitz, for example, the court ordered a laundry list of sanctions, some of which were pretty unfathomable:

1. Defendant had to pay plaintiff $250,000

2. Defendant had to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years

3. Defendant had to file the court’s order in every case that it is involved in for the next 5 years

The bottom line is that sanctions, despite the fear factor, can be used to drive positive proactive conduct – namely in the shape of eDiscovery best practices.

Outside Counsel Duties. Here, the Gibson report notes that outside counsel’s Zubulake duties continue to increase over time, with a number of cases continuing the trend of holding attorneys responsible for ensuring that their clients properly implement legal holds, institute sound sampling protocols and conduct sufficient quality control steps.  This line of discussion can be useful when talking to outside counsel where we’re starting to see how their increasing responsibilities can lead to malpractice exposure, as seen in the recent McDermott case.

Search/Analysis. Lately there’s been a ton of buzz about predictive coding, but (despite the hype) it still doesn’t appear ready for prime time yet.  The Gibson report noted that there were no reported cases that addressed the use of predictive coding or other advanced search technologies.  My sense is that without some semblance of judicial approval or strong client backing, outside counsel (who are concerned about their malpractice exposure, per above) aren’t quickly going to be the first ones into the pool.  Unless an enterprise client demands that they use this type of technology, most will wait for judicial approval and that’s probably still a way off.  While next generation search technologies are more promise than reality right now, there is still a mandate to implement a defensible search methodology.  These are needed initially to demonstrate transparency in the eDiscovery process and to then withstand the challenges levied by counsel in the case of an inadvertent production.

In sum, the Gibson report shows the ongoing maturation of the eDiscovery space.  But, any niche market led by case law and/or attorneys deciding to adopt new technologies won’t be quick to change.  In many instances, therefore, the best practices will be decided a combination of standards bodies and vendors who are being pushed by their more forward thinking clients to get and stay on the cutting edge.

Clearwell’s Use In The Matter of Datel v Microsoft

Monday, April 4th, 2011

It’s widely known that Microsoft is a Clearwell customer, and uses our product for e-discovery across a wide range of matters. One such matter is the case of Datel Holdings v. Microsoft Corporation, which is presently in District Court for the Northern District of California. As part of those proceedings, Microsoft mentioned Clearwell in its Opposition to Datel’s Motion to Compel that was ruled upon on March 11, 2011:

Defendant explains that after potentially responsive documents were collected from custodians, they were loaded into a computerized document processing system known as “Clearwell.” Clearwell extracted metadata from each document and converted the documents into a format that allowed for text searching. Once the documents were processed through Clearwell, they were entered into an online platform, where they were reviewed by attorneys. For reasons still unknown to Defendant, Clearwell truncated some “Re-auth” documents during processing.

In itself, this sounds unremarkable. But we’ve noticed that some of our small competitors have been using this statement, and particularly the last line of it, to suggest that there are problems with the Clearwell product.

We realize that, as the market leader, there will always be small competitors seeking to leverage any opening to their advantage. Usually, we ignore this nonsense. But this time, to set the record straight, we asked our customer at Microsoft to respond on our behalf

Here’s what Joe Banks, who manages the e-discovery team at Microsoft, wrote about the issue and gave us permission to publish:

Statement from Microsoft:

In regard to the Declaration of Hojoon Hwang referenced in the 3/11/11 Order granting in part and denying in part plaintiff’s Motion to Compel in Datel Holdings LTD v. Microsoft, No.C-0905535EDL in the Northern District of California, the statement ‘For reasons still unknown to Defendant, Clearwell truncated some ‘Re-auth’ documents during processing’ should be corrected.  Microsoft subsequently learned that the cause of the truncation was the Microsoft software (AD/RMS Bulk Protection Tool) employed to decrypt previously encrypted content, and the truncation issue had nothing to do with Clearwell’s technology whatsoever.  Shortly after Mr. Hwang’s declaration was filed, he clarified – on the record in open court on February 22 – that Microsoft’s decryption process was the true cause of the data truncation:

6 A lot of Microsoft documents, including e-mails, are

7 encrypted when they are sent. And for production purposes, we

8 have to decrypt it. In that process, some of the material got

9 cut off.

Microsoft does not use Clearwell technology to decrypt its data.  In actuality, Clearwell’s Engineering and Support teams were instrumental in helping to identify the root cause of the truncation issue.  Microsoft continues to use Clearwell’s processing and analysis technology on this matter and greatly appreciates the partnership and support Clearwell provides without fail.

How Do You Sample Electronically Stored Information (ESI) in E-Discovery?

Wednesday, February 9th, 2011

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes.A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden. Let’s examine a few.

Referring to the well-known Zubulake v. UBS Warburg, F.R.D. 280, the courts ordered the producing party in Makrakis v. Demelis, No. 09-706-C, 2010 WL 3004337 (July 13, 2010) to essentially sample just a small number of backup tapes, at the expense of the requesting party. This is also remarkable in the cost-shifting of processing and reviewing of the sample, however small, to the requesting party. Such measures, while reducing the costs of overall e-discovery, places a greater burden on sample selection to the requesting party, forcing them to apply the reasonableness evaluation.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

The Perils of Data Collection in High Stakes Litigation: Which Approach Is Right For Your Organization?

Monday, February 7th, 2011

Many organizations involved in litigation, investigations, or audits struggle to meet deadlines for collecting and producing electronically stored information (ESI) from employees without breaking the budget.  The biggest challenges are typically faced by large organizations with multiple offices and large numbers of employees.  However, even smaller organizations with few offices face challenges if they have remote employees or employees who travel frequently, aka road warriors.  In this first of a two-part series, I’ll discuss when and why organizations should choose a manual collection process.  Part two will discuss the advantages and disadvantages of two automated data collection approaches.

In each situation, the organization is faced with a request for ESI and some portion of the potentially relevant ESI is located in remote offices or on laptops used by road warriors.  Preserving and collecting ESI across multiple systems such as email and file servers, archival systems, Microsoft SharePoint, and personal computers can be challenging whether these systems are located centrally or in the cloud.  Common challenges include:

  • Pressing deadlines
  • Risk of data loss or deletion
  • Failure to produce responsive data without legal justification
  • Lack of information technology (IT) department resources
  • Miscommunication between the IT and legal departments

These challenges are compounded for organizations with remote offices or road warriors because more coordination and effort is inevitably required, thereby increasing expenses and the risk of failure.   The key to success is determining which data collection approach is best for your organization.  First, let’s discuss the traditional manual approach.

The Traditional Manual Approach

There are two different manual data collection approaches that organizations utilize with varying degrees of success.  Employee self-collection and IT assisted collection.

Employee Self-Collection

The various data collection approaches often begin as part of an investigation, litigation, or audit that requires the identification of employees likely to have data relevant to a particular matter.   Those employees, or data custodians as they’re called, are asked to forward or copy any relevant ESI they possess to a centralized location or storage device where the data is stored for later analysis and review by the legal team.  One problem with this approach is that copying files could result in metadata information such as document dates being altered.  Another problem with this approach is that custodian’s memories fade over time and they may forget to produce relevant ESI.  Even worse, a custodian with a personal stake in the investigation may intentionally delete the very files being requested in an effort to thwart the investigation.  These scenarios could result in the organization facing sanctions or penalties, making employee self-collection a potentially risky and costly approach in almost any situation involving multiple custodians, offices, or large amounts of data.

IT Assisted Collections

The IT assisted collection approach is another manual approach that eliminates some of the risks associated with the employee self-collection method, but this approach often presents different challenges and often leads to “over collection” of ESI.  Typically one or more employees in the IT or IT Security Department are instructed to collect data from employees believed to have information relevant to a particular case.  To avoid overlooking or losing data, the IT resources collect data from numerous locations using computers loaded with specialized collection software.   Data to be collected from each relevant employee often resides on numerous devices including laptops, desktops, file servers, email servers, and other sources.   Once all the data for each custodian is collected from each data source, the data is copied and consolidated to a removable hard drive or drives where it awaits future processing, analysis, and review by the legal department.   Unfortunately for the IT department, this entire process is repeated for every new case and often results in a significant loss of productivity.

IT assisted collections were once the norm because this process was thought to represent the most efficient and effective way to avoid the risk of sanctions posed by the employee self-collection approach.   However, this approach is quickly falling out of vogue for two reasons:

First, IT assisted collections can increase the time, cost, and risk associated with data collection because the use of different technology tools can be challenging.   Organizations applying the IT assisted collection approach typically rely on off-the-shelf software such as Guidance Encase, Robocopy, ExMerge, Access Data’s Forensic Toolkit (FTK) or other tools to collect data from each relevant custodian. Frequently, different tools are utilized to collect data from different data sources.  For example, it is not uncommon for the IT department to use ExMerge to collect from Microsoft Exchange, Robocopy to collect from file servers, Encase to collect from laptops and desktops, and even other proprietary tools to collect data found in commonly used archives.  In addition to being time consuming, utilizing multiple tools to collect and consolidate data results in licensing, training, and maintenance costs for each product and the risk of data loss or alteration is heightened since data collected from multiple tools must eventually be exported and consolidated for further processing, analysis, and review.  Lastly, using multiple IT staff with varying levels of expertise to collect data arguably increases the risk of metadata being altered and complicates the ability to maintain accurate chain of custody logs.  In practice, many organizations using multiple collection tools spend countless hours trying to manually maintain chain of custody reports using Excel spreadsheets while other organizations simply neglect or ignore chain of custody requirements.  Each of these situations virtually invites evidentiary attacks by savvy opponents.

The second reason IT assisted collections are falling into disfavor is because the approach often results in the over collection of data.  To avoid the risk of sanctions or penalties resulting from data loss or deletion, sometimes entire laptop and desktop hard drives are copied or “imaged” (frequently called a “forensic image”).  Similarly, IT resources are often incentivized to “copy everything” simply to avoid being forced to revisit data sources from which data has already been partially collected in response to a new request for information.

The IT assisted approach of forensically imaging drives can be effective in limited situations including criminal investigations and intellectual property theft cases since these matters sometimes require the recovery and analysis of deleted files, internet browsing history, and other non-user generated files for a discreet number of custodians.  However, since most large matters do not require this degree of data recovery for most data sources, unnecessarily collecting data by making forensic images often results in a significant waste of time and money.

Which Approach is Right for Your Organization?

The risks and expenses associated with both manual approaches described above are often so high that organizations sometimes decide it is economically more efficient to settle lawsuits even when the lawsuit lacks merit.  This untenable position has led many organizations to seek more efficient and repeatable methods to manage data collection that are automated.  These automated approaches will be explored in my next post.

Ruling the World of Information Management and Electronic Discovery

Wednesday, November 17th, 2010

If you’re anything like Dr. Evil, Tears for Fears, or Napoleon, ruling the world is at or near the top of your to-do list, and part of ruling the world is having as omniscient a knowledge as possible of what’s going on, in order to better control it. Ruling the world has also long been the dream of many software vendors, who want to own and understand all the information in an enterprise in order to, um, provide maximum value to their customers… oh, and also to lock them in to a single underlying platform that allows them to control as much of the organization’s information management decisions as possible.

In some cases, these dual interests are aligned. However, in e-discovery, it’s not so clear. Over the last couple of years, many vendors have pushed a notion of “index everything” or so-called “proactive” e-discovery, in which you have instant access to all the information in your enterprise, in real-time, from which to drive your e-discovery process. But is this feasible? Or even desirable?

The Myth of the Silver Bullet

It can be tempting for IT to turn to an enterprise search solution that can index all data sources – laptops, desktops, file servers, SharePoint servers, databases, email archives, content management systems – and enable e-discovery across the entire enterprise in an instant. The reality is that while such a solution may work for enterprise search in small and medium-sized companies with a finite scope of data, the level of complexity in scale and defensibility of operations makes this simply not an achievable approach for e-discovery at most large enterprises. As Anne Kershaw and Joe Howie of the Electronic Discovery Institute noted in their just-published Judges’ Guide to Cost-Effective E-Discovery:

“There is no single silver bullet that solves all problems associated with escalating discovery costs and delays. As noted above, the single most effective cost reduction method is the focused collection of records most likely to contain relevant information. Some argue that e‐discovery is best accomplished by taking large amounts of data from clients and then applying keyword or other searches or filters. While, in some rare cases, this method might be the only option, it is also apt to be the most expensive. In fact, keyword searching against large volumes of data to find relevant information is a challenging, costly, and imperfect process. A much better approach is to ask key client contacts to help you locate core relevant information and then, by reading that information, determine other sources of relevant information.

What are the specific reasons why a targeted collection approach is superior? From our conversations with clients as we have been developing our solution to this problem over the last couple of years, three major drawbacks to the index-everything approach stand out.

1. Impact to Existing IT Environment

While the collect-and-preserve approach employed by Clearwell is widely accepted for e-discovery, index-everything and preserve-in-place solutions have recently emerged, originating from other enterprise applications such as knowledge management and enterprise search. These approaches from other domains have significant disadvantages when applied to e-discovery, including impact to existing IT infrastructure and processes that result in increased cost and complexity. For instance, the scope of e-discovery can exceed the amount of information being indexed by knowledge management or enterprise search applications. According to Forrester, the majority of enterprise search implementations range in size from the hundreds of thousands to tens of millions of records, not billions of documents that are potentially discoverable during litigation. Consequently, index-everything solutions must index a much larger volume of data across a broader range of applications and data stores than would typically be necessarily for enterprise search.

Indexing such a large amount of data has implications for the entire IT environment. These solutions either crawl data repositories over the network or employ agents on local desktops and laptops to find new and modified files. IT organizations using these solutions report experiencing disruptions including:

• Requiring read access and permissions to numerous line-of-business applications and storage systems where data resides

• Significant increases to disk I/O for enterprise applications, network file shares, and client machines

• Increased network consumption as large amounts of data are read over the network

• Increased consumption of local hard drive space on employee desktops and laptops for search indexes and redundant copies of preserved files

• Scheduling resource-intensive indexing tasks during off-peak hours, impacting the ability of IT departments to complete backups during shrinking backup windows

Taken together, these issues add cost and complexity to the deployment of index-everything and preserve-in-place solutions. This often results in organizations not fully deploying the solution after purchasing licenses and spending months or years trying to integrate with their existing systems.

2. Risk of Missing Critical Data

Another key concern of organizations seeking to meet e-discovery requests is the ability to find all relevant files and documents for a case. Missing even a few important documents may result in multimillion dollar fines and sanctions. UBS and Morgan Stanley each paid $29.2 million and $12.5 million, respectively, for losing key files during litigation. It is therefore critically important that e-discovery solutions have the ability to not only index and search common file types, but also a range of less common but equally important files such as those within nested container files, encrypted files, and TIFF images containing text. Solutions that originate from applications outside the e-discovery domain often skip these files because 100% accuracy is not required for other applications such as enterprise search. Across organizations with billions of documents, there may be hundreds of thousands of potentially relevant files which are in the dark and unknown to legal teams because they are not indexed.

Index corruption is another commonly reported issue with index-everything solutions that results in incomplete search results. Search indexes are susceptible to data corruption just like any other computer file, but the large size of indexes containing billions of records increases the probability of errors. In fact, this is a common problem of most archive solutions and other solutions that manage billions of records. A corrupt search index will result in incomplete results or in the worst case scenario, the inability to conduct searches until the index is repaired. In some situations, data must be re-indexed to rebuild a corrupt search index which is time consuming due to the slow speed of some solutions.

The net result isthat in-place solutions increase the likelihood of missing critical data, exposing the organization to considerable legal and financial risk.

3. Time Delays and Uncertainty in Searches

When embarking on a project to make all enterprise data searchable for e-discovery, an important consideration is indexing speed in relation to total outstanding data and projected data growth. Organizations deploying such a solution typically have a large amount of existing data that needs to be indexed, and this index must be continually updated as data is modified and new data is created. Many companies report that although vendors claim high processing rates, these high rates erode over time as companies index greater amounts of their existing data, increasing the size of search indexes. Beyond an application’s ability to index data, there are exogenous factors affecting indexing performance including network speed, disk I/O, and latency. Along with index size and the number of search indexes, these factors can also affect search query performance, resulting in searches that take hours or days to return results.

Another issue facing organizations deploying index-everything solutions is that end users may be creating and modifying documents faster than the solution can index them. As a result, there is a widening gap between the state of data in the wild and the solution’s picture of that data, leading to incomplete search results. Equally troubling, search results may include files that were moved after the search engine indexed them, and so they appear in the results but cannot be viewed, retrieved, or preserved. End users clicking on the link to an item may receive an error similar to the “404 Error: File Not Found” that everyone has experienced when browsing the web. This presents a significant defensibility problem in e-discovery, and IT teams often end up tracking down these missing files one-by-one to ensure they are preserved. The result is that organizations may be exposed to unnecessary legal risk while IT teams have the additional burden of manually tracking down hundreds of files for each legal matter.

A Better Approach to Collection and Preservation

Recognizing the challenges of collection and preservation, Clearwell has developed a targeted approach that enables organizations to defensibly collect and preserve data without increasing the work of IT or exposing the organization to risk. Targeted collection provides an easy way for IT or Legal teams to collect from all critical data sources and securely manage collected data in a preservation store for the duration of a case. Unlike index-everything and preserve-in-place approaches, Clearwell is up and running quickly, delivering value in hours or days without the cost and complexity of lengthy multi-month deployment timelines. In addition, Clearwell’s targeted collect-and-preserve approach has a number of benefits over in-place approaches:

Minimal impact to IT infrastructure: Clearwell only collects potentially relevant data from custodians involved in a case or investigation, targeting resources at the most important data instead of wasting resources on indexing all data across the entire organization. As a result, targeted collection requires less impact to existing applications and storage systems, does not cause significant increases to disk I/O or network consumption, and does not require agents to be installed on client machines or servers.

Finds all critical data: Purpose-built to support the complex and difficult to read file types required by e-discovery, Clearwell can index and search all critical content such as nested container files, encrypted files, images containing text, and hidden content.

Up-to-date collection: Clearwell collects all relevant data for e-discovery by targeting information that is related to custodians in the case. Because this approach is not limited by legacy indexing approaches, Clearwell is able to collect data that has been recently modified or moved.

Maintains existing workflow: With Clearwell, end users are able to continue using their existing workflows and business processes without interruption. Using targeted collection, Clearwell can collect data in the background without altering data where it resides. When users create or modify files in the normal course of business, Clearwell incrementally collects new data automatically.

Reduces risk: Targeted collection significantly reduces the risk of spoliation by retaining data in a secure preservation store, providing a defensible process that maintains chain of custody. As a result, data cannot be tampered with by end users or accidently lost on laptops, desktops, or other data repositories not under the control of IT.

Collecting and preserving evidence are critical steps in the e-discovery process. Solutions that promote indexing everything as the optimal solution for your e-discovery problems might be conceptually promising, but create new challenges for IT and increase risk in practice. As a result, organizations are seeking a solution that enables them to respond effectively to e-discovery without causing major disruptions or exposing the organization to additional risk. Clearwell’s targeted approach solves the challenges of collection and preservation by making it easy to collect data from all critical data sources and preserve data defensibly, without incurring greater risk or disrupting the organization’s business processes.

How to Reduce E-Discovery Costs Part V: What Part of E-Discovery To Bring In-House

Thursday, December 10th, 2009

Part IV of this series on reducing e-discovery costs described how bringing e-discovery in-house can reduce costs.  One of the major decision points when in-sourcing e-discovery is to decide which parts of the e-discovery process should be in-sourced.  In making this decision, each company should look at the nature of their e-discovery process today, which parts of the e-discovery workflow they currently perform in-house, if any at all, and which are currently outsourced.  They should then look at which outsourced parts would produce the best return on investment (ROI) if in-sourced.

When most companies look at their current litigation software process, they often find that they are already in-sourcing the first stages of e-discovery: identification, preservation and collection.  While there are some companies that will occasionally outsource these steps, especially when there is a need to perform forensic collections, most sizable companies are already doing most of these steps themselves, though often advised by outside counsel.  For example, most companies will identify the custodians and sources of electronically stored information (ESI) in conjunction with outside counsel.  Litigation hold notices will be sent internally and data will be collected by the company’s IT, legal IT and/or internal forensic/investigations team.  It is typically at this point that e-discovery moves outside the company as the data is transferred to a litigation support service provider and/or law firm who perform processing, analysis, review, and production.

When a company takes a look at how they can reduce their e-discovery costs, they are most often looking at two high-level options:

  1. Whether they can streamline their existing internal identification, preservation and collection processes
  2. Whether they should bring processing, analysis, review and/or production in-house

There are of course exceptions to this.  Some companies do outsource their collection for example, especially when collection might need to be done in remote offices.  But the majority of companies seem to fall in the above categories.  Distinguishing these two options is important because the ROI analysis and decision-making process related to streamlining an existing process is very different than the analysis and decision-making related to bringing a process in-house.

When performing an ROI analysis of these different options, one typically comes to two conclusions.  The first is that both are often ROI positive projects.  The second is that in-sourcing some aspects of processing, analysis and review is far and away the biggest “bang for the buck” project that most companies can undertake when it comes to reducing e-discovery costs.  The biggest reason for the second conclusion is that the majority of the costs incurred during e-discovery are processing and review costs.  In a previous post where we analyzed e-discovery costs, we found that processing and review typically represent over 90% of these costs.  As a result, in-sourcing some or all aspects of processing, analysis and review can save very significant amounts of external processing fees and attorney review costs.  In contrast, while there can be real savings to improving and automating identification, preservation and collection, the size of savings pales in comparison because these steps represent less than 10% of the total cost of e-discovery.

The best approach to reducing e-discovery costs, of course, would be to do both of these projects: improve identification, preservation and collection as well as in-source processing, analysis and review.  However, if you have to sequence these projects or pick only one (a popular requirement in this economy) then in-sourcing processing, analysis and review is the one to pick.

Litigation and E-Discovery Trend Surveys Find Similar Results

Thursday, November 19th, 2009

As the Mark Twain quote goes, there are “lies, damn lies and statistics.”  In this case, however, and regardless of the exact numbers, two recent surveys provide some very interesting directional trending.  The first is Fulbright & Jaworski’s 6th Annual Litigation Trends Survey.  In addition to covering a range of general and vertically oriented topics, they also focus on ediscovery specifically.  Not surprisingly, reducing e-discovery costs bubbles up to the top of the list as major initiatives for most respondents.  Interestingly though, remediation plans attacking this problem seem to fall into two different camps.  On the one hand, 24% of respondents plan on outsourcing certain e-discovery tasks further leveraging preferred partners.  Conversely, the method that leads the pack (at a whopping 47%) is the corporate initiative of taking components of e-discovery in-house.  Other methods were listed, but most didn’t appear to have critical mass, including: using clawback agreements more, enforcing document retention policies, and negotiating with the opposition over the scope of discovery.

Similarly, Clearwell Systems recently conducted a survey in partnership with analyst firm Enterprise Strategy Group titled Trends in Electronic Discovery – A Market Perspective, which attempted to pinpoint similar pain points and solutions. The questions focused more on 2010 planning and they found a general expectation of more litigation/regulatory inquiries where 53% of the respondents expect the number of lawsuits and regulatory inquiries to increase by at least 20% in 2010, with 13% of respondents planning for an increase of 50 percent or more.  Again, not surprisingly, many plan on attacking this increase in litigation (and the corresponding e-discovery costs) by bring parts of the process in house.  In fact, 48% indicated that they currently have an active project to bring segments of the e-discovery process in-house. And for those that aren’t currently in the building process, 87% of respondents plan to budget for technology that specifically supports the electronic discovery process in 2010.

Given the length of time required for planning, RFPs and e-discovery tool procurement, clearly time is of the essence for companies that want to take advantage of internal solutions in the 2010 time frame.  Failure to get off the dime means that an enterprise is more likely to get caught in the middle of deliberation, versus deployment.

Read more about Legal discovery & Electronic Discovery Litigation

EMC Acquires Kazeon For $75 million To Round-Out SourceOne Archiving & E-Discovery Solution

Tuesday, September 1st, 2009

“Large storage vendor buys small electronic discovery software company to round-out broader corporate initiative.” That was the story in December 2007, when Seagate bought e-discovery company Metalincs for its i365 solution; and, it’s the same story today as EMC announced its acquisition of Kazeon for its SourceOne archiving solution. The terms of the EMC-Kazeon deal were not disclosed, but sources with knowledge of the transaction tell me that the acquisition price is approximately $75 million. That’s slightly less than what Seagate paid for Metalincs ($82 million), and less than what FTI Consulting paid for Attenex ($88 million). But it’s well within the usual range of $50-100 million that most acquirers pay for technology that has not yet matured into a business.

The deal will come as a relief to Kazeon’s long-suffering shareholders. The company was founded in 2003 and, over the past 6 years, it raised over $60 million in equity financing, double the amount it usually takes successful software companies to reach profitability. But despite all that investment, revenue has been hard to come by. According to former Kazeon employees, the company’s revenue totaled only $7 million over the past 12 months. Perhaps as a result, there’s been a lot of management turnover, and last year the board retained a recruiter to find a new CEO. In light of all that, selling the company for $75 million, or 10 times trailing revenue, is a great outcome for Kazeon’s shareholders. It also provides some level of job security for Kazeon’s employees, many of whom have been offered retention bonuses to stick around.

On the other side of the coin, the deal also makes sense for EMC, which needed to flesh out SourceOne, its recent re-branding of the Email Extender archive. In launching SourceOne in April 2009, EMC described it as an integrated portfolio of products: SourceOne Email Management for email archiving; Discovery Manager for legal holds of email; Celerra and Centera for storage; and Discovery Collector for identifying and collecting data from desktops and file shares. EMC owned all of those products except one: Discovery Collector, which instead was to come from EMC Select Partner, StoredIQ. It is widely known that EMC tried repeatedly to acquire StoredIQ but was rebuffed. So instead, it purchased Kazeon (i.e., the Kazeon Information Server) so that it now owns all aspects of SourceOne and does not have to rely on partners.

Will this eDiscovery deal be successful? We will have to wait and see, but Seagate’s experience is not encouraging. A year after it acquired Metalincs, Seagate laid off most of the staff and hired UBS to help it sell what was left of the electronic discovery company. There have not been any takers.

EDRM Continues Drive to Solve Practical Electronic Discovery Problems

Tuesday, June 23rd, 2009

As most electronic discovery veterans are aware, the EDRM Project is an effort founded five years ago by George Socha and Tom Gelbmann to bring together a community of e-discovery practitioners for the purpose of solving some of the industry’s most challenging problems.

It may be hard to believe, but there was time in the very recent past where the iconic EDRM model did not yet exist. No multicolored boxes, no arrows, no sloping volume and relevance lines — nothing. Coming up with a standard way of talking about electronic discovery was the first problem that the group set about solving, and I think it would be hard to argue with the fact that they came up with the gold standard: a simple, clear, concise model that, at least so far, is standing the test of time as a way of thinking about the flow of the e-discovery process.

With each passing year, the group has started to address a broader set of problems, all with a practical bent.  Currently, there are eight:

Project Goal
Evergreen Keep the EDRM model fresh and relevant as the industry grows and evolves
XML Provide a standard, generally-accepted XML schema to facilitate the movement of electronically stored information from one step of the e-discovery process to the next
Metrics Provide an effective means of measuring the time, money, and volumes associated with e-discovery activities
Code of Conduct Develop aspirational voluntary ethical guidelines for e-discovery providers and consumers
Search Provide a framework for defining and managing the various aspects of search as it applies to the e-discovery workflow
Data Set Compile a 100 gigabyte public data set that can be used to test various aspects of e-discovery software and services
Jobs Provide a professional resource for the e-discovery community and  communicate about e-discovery related jobs
Information Management Explore the emerging need for e-discovery standards in information management (the “upstream” part of the process)

This year’s annual EDRM conference took place back in May. After years of meeting in the same chilly and wind-swept location in downtown St. Paul, Minnesota, George and Tom had the brilliant idea of spicing up the meeting a bit by moving it to a more exotic locale: Bora Bora! Plans were set in motion, but quickly the overwhelming feedback came back from EDRM members: E-discovery is so fascinating, so heart-warming, that adding Bora Bora to the mix would simply be too much for the vast majority of the participants to bear. So St. Paul it was!

This was Clearwell’s third EDRM conference, and location aside, it’s been fascinating to see how it has changed over the last few years. Here are several notable trends from this year’s kickoff:

  • More participation from end-users: There was a definite increase in the number of end-user/consumer participants (that is, those not from the vendor community), particularly from law firms. This could be taken as further evidence that e-discovery is indeed moving in-house.
  • Increased enthusiasm to take on new challenges: One of the great things about EDRM is its willingness to try to tackle new areas that aren’t being directly addressed by some of the other (fantastic) organizations out there like Sedona. This was in evidence several years ago, when Clearwell was fortunate to get involved in the early stages of the EDRM XML project, which has proven to be a huge time, cost, and risk reducer for many in the industry by providing a common standard that can be used to move data within the e-discovery process. It was in evidence last year when Clearwell’s CTO was able to help launch a new effort around Search that is seeking to develop standards and best practices in an increasingly complex and contentious area. And, finally, it was in evidence this year with the launch of the Information Management project, a cutting-edge group that is exploring how to solve the challenges that e-discovery poses for information management – certainly a complex area in need of thought leadership.
  • Improved collaboration: One thing that has amazed us from day one is how collaborative EDRM is, and continues to become. There are a lot of e-discovery vendors involved who, outside of the confines of the St. Paul Hotel, aggressively compete in the marketplace. However, George and Tom have been able to create an environment at EDRM where competitive spirits are set aside and ideas can be cultivated which provide huge value across the e-discovery landscape (both vendor and consumer).

One final note: If you’re an e-discovery practitioner in a law firm or corporate setting, I’d encourage you to get connected, either informally (through the EDRM web site) or formally (by signing up for one or more of the projects). While end-user involvement continues to grow, there is definitely still a need for more non-vendor involvement. It is critical in ensuring real and relevant problems get solved, and to pushing the state of the art in e-discovery forward. Please join us!