Archive for the ‘ediscovery in the cloud’ Category

Ruling the World of Information Management and Electronic Discovery

Wednesday, November 17th, 2010

If you’re anything like Dr. Evil, Tears for Fears, or Napoleon, ruling the world is at or near the top of your to-do list, and part of ruling the world is having as omniscient a knowledge as possible of what’s going on, in order to better control it. Ruling the world has also long been the dream of many software vendors, who want to own and understand all the information in an enterprise in order to, um, provide maximum value to their customers… oh, and also to lock them in to a single underlying platform that allows them to control as much of the organization’s information management decisions as possible.

In some cases, these dual interests are aligned. However, in e-discovery, it’s not so clear. Over the last couple of years, many vendors have pushed a notion of “index everything” or so-called “proactive” e-discovery, in which you have instant access to all the information in your enterprise, in real-time, from which to drive your e-discovery process. But is this feasible? Or even desirable?

The Myth of the Silver Bullet

It can be tempting for IT to turn to an enterprise search solution that can index all data sources – laptops, desktops, file servers, SharePoint servers, databases, email archives, content management systems – and enable e-discovery across the entire enterprise in an instant. The reality is that while such a solution may work for enterprise search in small and medium-sized companies with a finite scope of data, the level of complexity in scale and defensibility of operations makes this simply not an achievable approach for e-discovery at most large enterprises. As Anne Kershaw and Joe Howie of the Electronic Discovery Institute noted in their just-published Judges’ Guide to Cost-Effective E-Discovery:

“There is no single silver bullet that solves all problems associated with escalating discovery costs and delays. As noted above, the single most effective cost reduction method is the focused collection of records most likely to contain relevant information. Some argue that e‐discovery is best accomplished by taking large amounts of data from clients and then applying keyword or other searches or filters. While, in some rare cases, this method might be the only option, it is also apt to be the most expensive. In fact, keyword searching against large volumes of data to find relevant information is a challenging, costly, and imperfect process. A much better approach is to ask key client contacts to help you locate core relevant information and then, by reading that information, determine other sources of relevant information.

What are the specific reasons why a targeted collection approach is superior? From our conversations with clients as we have been developing our solution to this problem over the last couple of years, three major drawbacks to the index-everything approach stand out.

1. Impact to Existing IT Environment

While the collect-and-preserve approach employed by Clearwell is widely accepted for e-discovery, index-everything and preserve-in-place solutions have recently emerged, originating from other enterprise applications such as knowledge management and enterprise search. These approaches from other domains have significant disadvantages when applied to e-discovery, including impact to existing IT infrastructure and processes that result in increased cost and complexity. For instance, the scope of e-discovery can exceed the amount of information being indexed by knowledge management or enterprise search applications. According to Forrester, the majority of enterprise search implementations range in size from the hundreds of thousands to tens of millions of records, not billions of documents that are potentially discoverable during litigation. Consequently, index-everything solutions must index a much larger volume of data across a broader range of applications and data stores than would typically be necessarily for enterprise search.

Indexing such a large amount of data has implications for the entire IT environment. These solutions either crawl data repositories over the network or employ agents on local desktops and laptops to find new and modified files. IT organizations using these solutions report experiencing disruptions including:

• Requiring read access and permissions to numerous line-of-business applications and storage systems where data resides

• Significant increases to disk I/O for enterprise applications, network file shares, and client machines

• Increased network consumption as large amounts of data are read over the network

• Increased consumption of local hard drive space on employee desktops and laptops for search indexes and redundant copies of preserved files

• Scheduling resource-intensive indexing tasks during off-peak hours, impacting the ability of IT departments to complete backups during shrinking backup windows

Taken together, these issues add cost and complexity to the deployment of index-everything and preserve-in-place solutions. This often results in organizations not fully deploying the solution after purchasing licenses and spending months or years trying to integrate with their existing systems.

2. Risk of Missing Critical Data

Another key concern of organizations seeking to meet e-discovery requests is the ability to find all relevant files and documents for a case. Missing even a few important documents may result in multimillion dollar fines and sanctions. UBS and Morgan Stanley each paid $29.2 million and $12.5 million, respectively, for losing key files during litigation. It is therefore critically important that e-discovery solutions have the ability to not only index and search common file types, but also a range of less common but equally important files such as those within nested container files, encrypted files, and TIFF images containing text. Solutions that originate from applications outside the e-discovery domain often skip these files because 100% accuracy is not required for other applications such as enterprise search. Across organizations with billions of documents, there may be hundreds of thousands of potentially relevant files which are in the dark and unknown to legal teams because they are not indexed.

Index corruption is another commonly reported issue with index-everything solutions that results in incomplete search results. Search indexes are susceptible to data corruption just like any other computer file, but the large size of indexes containing billions of records increases the probability of errors. In fact, this is a common problem of most archive solutions and other solutions that manage billions of records. A corrupt search index will result in incomplete results or in the worst case scenario, the inability to conduct searches until the index is repaired. In some situations, data must be re-indexed to rebuild a corrupt search index which is time consuming due to the slow speed of some solutions.

The net result isthat in-place solutions increase the likelihood of missing critical data, exposing the organization to considerable legal and financial risk.

3. Time Delays and Uncertainty in Searches

When embarking on a project to make all enterprise data searchable for e-discovery, an important consideration is indexing speed in relation to total outstanding data and projected data growth. Organizations deploying such a solution typically have a large amount of existing data that needs to be indexed, and this index must be continually updated as data is modified and new data is created. Many companies report that although vendors claim high processing rates, these high rates erode over time as companies index greater amounts of their existing data, increasing the size of search indexes. Beyond an application’s ability to index data, there are exogenous factors affecting indexing performance including network speed, disk I/O, and latency. Along with index size and the number of search indexes, these factors can also affect search query performance, resulting in searches that take hours or days to return results.

Another issue facing organizations deploying index-everything solutions is that end users may be creating and modifying documents faster than the solution can index them. As a result, there is a widening gap between the state of data in the wild and the solution’s picture of that data, leading to incomplete search results. Equally troubling, search results may include files that were moved after the search engine indexed them, and so they appear in the results but cannot be viewed, retrieved, or preserved. End users clicking on the link to an item may receive an error similar to the “404 Error: File Not Found” that everyone has experienced when browsing the web. This presents a significant defensibility problem in e-discovery, and IT teams often end up tracking down these missing files one-by-one to ensure they are preserved. The result is that organizations may be exposed to unnecessary legal risk while IT teams have the additional burden of manually tracking down hundreds of files for each legal matter.

A Better Approach to Collection and Preservation

Recognizing the challenges of collection and preservation, Clearwell has developed a targeted approach that enables organizations to defensibly collect and preserve data without increasing the work of IT or exposing the organization to risk. Targeted collection provides an easy way for IT or Legal teams to collect from all critical data sources and securely manage collected data in a preservation store for the duration of a case. Unlike index-everything and preserve-in-place approaches, Clearwell is up and running quickly, delivering value in hours or days without the cost and complexity of lengthy multi-month deployment timelines. In addition, Clearwell’s targeted collect-and-preserve approach has a number of benefits over in-place approaches:

Minimal impact to IT infrastructure: Clearwell only collects potentially relevant data from custodians involved in a case or investigation, targeting resources at the most important data instead of wasting resources on indexing all data across the entire organization. As a result, targeted collection requires less impact to existing applications and storage systems, does not cause significant increases to disk I/O or network consumption, and does not require agents to be installed on client machines or servers.

Finds all critical data: Purpose-built to support the complex and difficult to read file types required by e-discovery, Clearwell can index and search all critical content such as nested container files, encrypted files, images containing text, and hidden content.

Up-to-date collection: Clearwell collects all relevant data for e-discovery by targeting information that is related to custodians in the case. Because this approach is not limited by legacy indexing approaches, Clearwell is able to collect data that has been recently modified or moved.

Maintains existing workflow: With Clearwell, end users are able to continue using their existing workflows and business processes without interruption. Using targeted collection, Clearwell can collect data in the background without altering data where it resides. When users create or modify files in the normal course of business, Clearwell incrementally collects new data automatically.

Reduces risk: Targeted collection significantly reduces the risk of spoliation by retaining data in a secure preservation store, providing a defensible process that maintains chain of custody. As a result, data cannot be tampered with by end users or accidently lost on laptops, desktops, or other data repositories not under the control of IT.

Collecting and preserving evidence are critical steps in the e-discovery process. Solutions that promote indexing everything as the optimal solution for your e-discovery problems might be conceptually promising, but create new challenges for IT and increase risk in practice. As a result, organizations are seeking a solution that enables them to respond effectively to e-discovery without causing major disruptions or exposing the organization to additional risk. Clearwell’s targeted approach solves the challenges of collection and preservation by making it easy to collect data from all critical data sources and preserve data defensibly, without incurring greater risk or disrupting the organization’s business processes.

Top Five Predictions in Electronic Discovery

Monday, November 15th, 2010

What’s next in the electronic discovery world?  Well, it’s nearly impossible to say with too much precision, but my recent e-discovery trends article attempts to peer into the crystal ball to divine some hints about the future.

The following five predictions are what I expect to create the biggest waves in e-discovery in 2011.  Most are nascent trends that we’ve seen a bit of in 2010, but that should continue to accelerate next year.  Enterprises that can prepare for and understand these areas will be well equipped to continue taking a proactive approach to the ever-changing challenges of e-discovery.

  1. Changes in Forensic Best Practices: In 2011, manual forensic imaging will continue to take a backseat to more automated, forensically sound data collection techniques.  Forensic (bit for bit) images have long been the gold standard for the legally sound collection of ESI in response to legal proceedings.  And, while forensic imaging will continue to be important in a number of discrete situations (fraud, misappropriation of trade secrets cases, etc.), it will largely be seen as overkill in basic electronic discovery cases.  Since imaging is both time consuming and highly manual, automated collection tools will increasingly be used by savvy organizations to speed up and streamline the collection process.
  2. Consolidation in the Electronic Discovery Industry: Consolidation in the electronic discovery sector will impact market forces and the balance of power.  The past year saw traditional, pure-play electronic discovery companies looking (sometimes successfully and sometimes not) for diversification and deep pockets.  In the upcoming year, the relative dearth of pure play EDD companies may reverse the downward price pressure that’s been seen over the past several years.
  3. Proportionality Becomes Reality: Burgeoning data volumes, as seen in multi-terabyte (versus gigabyte) cases, means that the legal community will continue to search for ways to prevent electronic discovery costs from exceeding legal exposure and attorneys fees.  Groups like The Sedona Conference will continue to push for better clarification within the community surrounding “proportionality” in order to keep the electronic discovery “tail” from wagging the litigation “dog.”  If successful at all, there may be a slight respite for litigious enterprises that may be able to better scale e-discovery efforts with the risk profile of the matter at hand.
  4. Collision of Cloud, Social Media and E-Discovery: The seemingly unstoppable migration of corporate data to the cloud, combined with the proliferation of social media applications, will continue to stress electronic discovery practitioners as they attempt to preserve, collect, search, and process electronically stored information (ESI) from sources that aren’t traditionally managed behind the firewall.  Proactive enterprises will increasingly evaluate the legal and compliance risks of storing data in the cloud so that they’re not painted into a corner when they need to preserve, collect, and produce offsite ESI.
  5. Global E-Discovery Matures: International jurisdictions will increasingly look to the United States (and the Federal Rules of Civil Procedure) as their nascent electronic discovery paradigms are increasingly stressed by the proliferation of both ESI and discovery disputes.  The recent Goodale case out of the UK (and impending procedural changes to the e-Disclosure Practice Direction) demonstrates how the global community is rapidly maturing along the electronic discovery continuum.

While the tools and best practices designed to combat top ediscovery hurdles continue to mature, the challenges are multiplying at any equally fast rate.  In the past, the crux of most discovery matters usually centered around email and sometimes instant messaging.  In 2011, new problems will continue to crop up on the horizon, such as collecting SharePoint data from the cloud, trying to extract structured data from a range of proprietary systems and capturing ephemeral ESI from an ever changing array of social media applications.

Please let me know if you disagree with any of the predictions or have any others you’d like to share.

E-Discovery and the Cloud: The Duty to Preserve Electronically Stored Information (ESI)

Friday, May 28th, 2010

One of the new buzz words of the last few years in computing has been Cloud Computing. After the initial hype, and the subsequent shakeout of its potential, everyone is beginning to recognize that it represents a paradigm shift in how we purchase, deploy, and utilize computing resources. The general impetus for the cloud has been its potential to reduce capital costs, offer flexibility in purchasing computing resources, and reduce operational costs in maintaining hardware resources.

A lot of what the cloud offers is achievable using existing technologies, but repurposed in new and innovative ways. Several forms of the cloud, with specific benefits to customers, are being packaged and promoted. The offerings are delivered as cloud services, such as Platform as a Service (PaaS), Infrastructure as a Service (IaaS) and Software as a Service (SaaS). Without getting into specifics, each service offering comes with a set of service agreements between the purchaser and provider of the cloud services.

As with any new initiative, there are new challenges to contend with including security and compliance with corporate policies and industry regulations.  Although these issues are substantial, for this article, let us consider the legal implications as it relates to electronic discovery. We all know that sooner or later, every organization faces litigation, and increasingly, fair number of them involves e-discovery. Traditionally, in house legal and IT teams have had an understanding of how to respond to legal requests and have focused on litigation readiness. But, how do these translate to the new cloud computing paradigm? I’ll examine some of the challenges in a series of posts on e-discovery and the cloud. For starters, let’s analyze the challenges and considerations inherent with the duty to preserve electronically stored information (ESI).

Duty to Preserve ESI

Before we get to the mechanics of electronic discovery and actual preparation for Rule 26(f) conference, the duty to preserve arises. The duty to preserve may be triggered when a legal proceeding is “reasonably anticipated” and increases in importance on receipt of pre-litigation correspondence or a similar trigger event. Traditionally, such duty to preserve is reflected by placing litigation holds. It is often the case that litigation holds are placed on at least a portion of the ESI well ahead of an actual triggering event. See Adams v. Dell as perhaps an extreme example. In fact, some organizations invest in litigation support software technologies for classifying data and placing holds on the most reasonable subset.

How does such a litigation hold translate into the cloud? As a customer of a cloud, one should craft service agreements to dedicate certain cloud-resident data, in the form of folders or other broad categories, to be preserved. If the cloud provider has deployed technology to ensure that no party within the customer’s user community can delete the preserved data, it is well and good. However, placing such restrictive access impedes normal running of the business, and becomes impractical. Essentially, data in the cloud that is available for normal course of business is in the hands of user-custodians. If they then delete the data either deliberately, or inadvertently, or through normal business functions, that data deletion is subject to spoliation claims. Even though the “safe harbor” from spoliation sanctions of Rule 37(f) applies when information is lost due to the “routine, good faith” operation of electronic information systems, when preservation order is in place, shelter under 37(f) is not possible. Thus, the actual implementation of litigation hold comes under scrutiny. Because of this, many implementations adopt preservation using a “copy and preserve” model. However, this model is at odds with live business data that is constantly evolving. Even if the latest point-in-time snapshot technology at the physical volume is employed, the result is inadequate – you end up preserving massive volumes of data in the cloud, unrelated to actual logical messages or files that need to be preserved. What is needed is some smartness in the form of an application in the cloud itself that can translate a litigation hold request into specific ESI in the cloud. Who owns and manages this application and what the service levels are for this application is a significant issue.

Now, the view from the cloud provider’s perspective is very different. In light of the flexible data management architectures available, there is a great temptation to share both data with a litigation hold and data without a litigation hold on the same physical infrastructure. As a result, the cloud provider   preserves all data from every customer that is resident on that infrastructure – a very conservative approach. As a consequence, this would preserve another customer’s ESI accidentally and that data is now discoverable, in the context of a different litigation, despite the second customer’s active management of the data. Preserving a set of live, constantly changing data in the context of a single enterprise is technically difficult; doing so across multiple customers, sharing the data infrastructure is exponentially harder.

Another related issue with preservation is the need for the ability to release preservation holds. Typically, when the litigation response team determines that the legal hold is not necessary, the hold is released. In the “copy and preserve” model of litigation hold, one has to verify that the released ESI does not overlap with other litigation holds and is marked for destruction. One of the benefits of the cloud is the flexibility in storing bits and pieces of data wherever data capacity is available. Applying the release can again be tricky for both cloud customer and the cloud provider.

Given these additional complexities of evidence in the cloud and the fact that the duty to preserve may arise well before the trigger event of litigation, the costs associated with the duty to preserve can add up very quickly. It’s essential to understand three critical items related to the duty to preserve in the cloud: 1) what the cloud provider would charge for ongoing preservation, 2) whether agreements with the cloud provider cover the legal issues raised by the duty to preserve and 3) what the cloud provider offers in terms of a flexible workflow for applying and releasing legal holds.

Learn More On Litigation Software & Electronic Discovery Litigation