24h-payday

Posts Tagged ‘search’

New Gartner Report Spotlights Significance of Email Archiving for Defensible Deletion

Thursday, November 1st, 2012

Gartner recently released a report that spotlights the importance of using email archiving as part of an organization’s defensible deletion strategy. The report – Best Practices for Using Email Archiving to Eliminate PST and Mailbox Quota Headaches (Alan Dayley, September 21, 2012) – specifically focuses on the information retention and eDiscovery challenges associated with email storage on Microsoft Exchange and how email archiving software can help address these issues. As Gartner makes clear in its report, an archiving solution can provide genuine opportunities to reduce the costs and risks of email hoarding.

The Problem: PST Files

The primary challenge that many organizations are experiencing with Microsoft Exchange email is the unchecked growth of messages stored in portable storage tablet (PST) files. Used to bypass storage quotas on Exchange, PST files are problematic because they increase the costs and risks of eDiscovery while circumventing information retention policies.

That the unrestrained growth of PST files could create problems downstream for organizations should come as no surprise. Various court decisions have addressed this issue, with the DuPont v. Kolon Industries litigation foremost among them. In the DuPont case, a $919 million verdict and 20 year product injunction largely stemmed from the defendant’s inability to prevent the destruction of thousands pages of email formerly stored in PST files. That spoliation resulted in a negative inference instruction to the jury and the ensuing verdict against the defendant.

The Solution: Eradicate PSTs with the Help of Archiving Software and Retention Policies

To address the PST problem, Gartner suggests following a three-step process to help manage and then eradicate PSTs from the organization. This includes educating end users regarding both the perils of PSTs and the ease of access to email through archiving software. It also involves disabling the creation of new PSTs, a process that should ultimately culminate with the elimination of existing PSTs.

In connection with this process, Gartner suggests deployment of archiving software with a “PST management tool” to facilitate the eradication process. With the assistance of the archiving tool, existing PSTs can be discovered and migrated into the archive’s central data repository. Once there, email retention policies can begin to expire stale, useless and even harmful messages that were formerly outside the company’s information retention framework.

With respect to the development of retention policies, organizations should consider engaging in a cooperative internal process involving IT, compliance, legal and business units. These key stakeholders must be engaged and collaborate if a workable policies are to be created. The actual retention periods should take into account the types of email generated and received by an organization, along with the enterprise’s business, industry and litigation profile.

To ensure successful implementation of such retention policies and also address the problem of PSTs, an organization should explore whether an on premise or cloud archiving solution is a better fit for its environment. While each method has its advantages, Gartner advises organizations to consider whether certain key features are included with a particular offering:

Email classification. The archiving tool should allow your organization to classify and tag the emails in accordance with your retention policy definitions, including user-selected, user/group, or key-word tagging.

User access to archived email. The tool must also give end users appropriate and user-friendly access to their archived email, thus eliminating concerns over their inability to manage their email storage with PSTs.

Legal and information discovery capabilities. The search, indexing, and e-discovery capabilities of the archiving tool should also match your needs or enable integration into corporate e-discovery systems.

While perhaps not a panacea for the storage and eDiscovery problems associated with email, on premise or cloud archiving software should provide various benefits to organizations. Indeed, such technologies have the potential to help organizations store, manage and discover their email efficiently, cost effectively and in a defensible manner. Where properly deployed and fully implemented, organizations should be able to reduce the nettlesome costs and risks connected with email.

Defensible Deletion: The Cornerstone of Intelligent Information Governance

Tuesday, October 16th, 2012

The struggle to stay above the rising tide of information is a constant battle for organizations. Not only are the costs and logistics associated with data storage more troubling than ever, but so are the potential legal consequences. Indeed, the news headlines are constantly filled with horror stories of jury verdicts, court judgments and unreasonable settlements involving organizations that failed to effectively address their data stockpiles.

While there are no quick or easy solutions to these problems, an ever increasing method for effectively dealing with these issues is through an organizational strategy referred to as defensible deletion. A defensible deletion strategy could refer to many items. But at its core, defensible deletion is a comprehensive approach that companies implement to reduce the storage costs and legal risks associated with the retention of electronically stored information (ESI). Organizations that have done so have been successful in avoiding court sanctions while at the same time eliminating ESI that has little or no business value.

The first step to implementing a defensible deletion strategy is for organizations to ensure that they have a top-down plan for addressing data retention. This typically requires that their information governance principals – legal and IT – are cooperating with each other. These departments must also work jointly with records managers and business units to decide what data must be kept and for what length of time. All such stakeholders in information retention must be engaged and collaborate if the organization is to create a workable defensible deletion strategy.

Cooperation between legal and IT naturally leads the organization to establish records retention policies, which carry out the key players’ decisions on data preservation. Such policies should address the particular needs of an organization while balancing them against litigation requirements. Not only will that enable a company to reduce its costs by decreasing data proliferation, it will minimize a company’s litigation risks by allowing it to limit the amount of potentially relevant information available for current and follow-on litigation.

In like manner, legal should work with IT to develop a process for how the organization will address document preservation during litigation. This will likely involve the designation of officials who are responsible for issuing a timely and comprehensive litigation hold to custodians and data sources. This will ultimately help an organization avoid the mistakes that often plague document management during litigation.

The Role of Technology in Defensible Deletion

In the digital age, an essential aspect of a defensible deletion strategy is technology. Indeed, without innovations such as archiving software and automated legal hold acknowledgements, it will be difficult for an organization to achieve its defensible deletion objectives.

On the information management side of defensible deletion, archiving software can help enforce organization retention policies and thereby reduce data volume and related storage costs. This can be accomplished with classification tools, which intelligently analyze and tag data content as it is ingested into the archive. By so doing, organizations may retain information that is significant or that otherwise must be kept for business, legal or regulatory purposes – and nothing else.

An archiving solution can also reduce costs through efficient data storage. By expiring data in accordance with organization retention policies and by using single instance storage to eliminate ESI duplicates, archiving software frees up space on company servers for the retention of other materials and ultimately leads to decreased storage costs. Moreover, it also lessens litigation risks as it removes data available for future litigation.

On the eDiscovery side of defensible deletion, an eDiscovery platform with the latest in legal hold technology is often essential for enabling a workable litigation hold process. Effective platforms enable automated legal hold acknowledgements on various custodians across multiple cases. This allows organizations to confidently place data on hold through a single user action and eliminates concerns that ESI may slip through the proverbial cracks of manual hold practices.

Organizations are experiencing every day the costly mistakes of delaying implementation of a defensible deletion program. This trend can be reversed through a common sense defensible deletion strategy which, when powered by effective, enabling technologies, can help organizations decrease the costs and risks associated with the information explosion.

Responsible Data Citizens Embrace Old World Archiving With New Data Sources

Monday, October 8th, 2012

The times are changing rapidly as data explosion mushrooms, but the more things change the more they stay the same. In the archiving and eDiscovery world, organizations are increasingly pushing content from multiple data sources into information archives. Email was the first data source to take the plunge into the archive, but other data sources are following quickly as we increase the amount of data we create (volume) along with the types of data sources (variety). While email is still a paramount data source for litigation, internal/external investigations and compliance – other data sources, namely social media and SharePoint, are quickly catching up.  

This transformation is happening for multiple reasons. The main reason for this expansive push of different data varieties into the archive is because centralizing an organization’s data is paramount to healthy information governance. For organizations that have deployed archiving and eDiscovery technologies, the ability to archive multiple data sources is the Shangri-La they have been looking for to increase efficiency, as well as create a more holistic and defensible workflow.

Organizations can now deploy document retention policies across multiple content types within one archive and can identify, preserve and collect from the same, singular repository. No longer do separate retention policies need to apply to data that originated in different repositories. The increased ability to archive more data sources into a centralized archive provides for unparalleled storage, deduplication, document retention, defensible deletion and discovery benefits in an increasingly complex data environment.

Prior to this capability, SharePoint was another data source in the wild that needed disparate treatment. This meant that legal hold in-place, as well as insight into the corpus of data, was not as clear as it was for email. This lack of transparency within the organization’s data environment for early case assessment led to unnecessary outsourcing, over collection and disparate time consuming workflows. All of the aforementioned detractors cost organizations money, resources and time that can be better utilized elsewhere.

Bringing data sources like SharePoint into an information archive increases the ability for an organization to comply with necessary document retention schedules, legal hold requirements, and the ability to reap the benefits of a comprehensive information governance program. If SharePoint is where an organization’s employees are storing documents that are valuable to the business, order needs to be brought to the repository.

Additionally, many projects are abandoned and left to die on the vine in SharePoint. These projects need to be expired and that capacity must be recycled for a higher business purpose. Archives currently enable document libraries, wikis, discussion boards, custom lists, “My Sites” and SharePoint social content for increased storage optimization, retention/expiration of content and eDiscovery. As a result, organizations can better manage complex projects such as migrations, versioning, site consolidations and expiration with SharePoint archiving.  

Data can be analogized to a currency, where the archive is the bank. In treating data as a currency, organizations must ask themselves: why are companies valued the way they are on Wall Street? For companies that perform service or services in combination with products, they are valued many times on customer lists, data to be repurposed about consumers (Facebook), and various other databases. A recent Forbes article discusses people, value and brand as predominant indicators of value.

While these valuation metrics are sound, the valuation stops short of measuring the quality of the actual data within an organization, examining if it is organized and protected. The valuation also does not consider the risks of and benefits of how the data is stored, protected and whether or not it is searchable. The value of the data inside a company is what supports all three of the aforementioned valuations without exception. Without managing the data in an organization, not only are eDiscovery and storage costs a legal and financial risk, the aforementioned three are compromised.

If employee data is not managed/monitored appropriately, if the brand is compromised due to lack of social media monitoring/response, or if litigation ensues without the proper information governance plan, then value is lost because value has not been assessed and managed. Ultimately, an organization is only as good as its data, and this means there’s a new asset on Wall Street – data.

It’s not a new concept to archive email,  and in turn it isn’t novel that data is an asset. It has just been a less understood asset because even though massive amounts of data are created each day in organizations, storage has become cheap. SharePoint is becoming more archivable because more critical data is being stored there, including business records, contracts and social media content. Organizations cannot fear what they cannot see until they are forced by an event to go back and collect, analyze and review that data. Costs associated with this reactive eDiscovery process can range from $3,000-30,000 a gigabyte, compared to the 20 cents per gigabyte for storage. The downstream eDiscovery costs are obviously costly, especially as organizations begin to deal in terabytes and zettabytes. 

Hence, plus ca change, plus c’est le meme chose and we will see this trend continue as organizations push more valuable data into the archive and expire data that has no value. Multiple data sources have been collection sources for some time, but the ease of pulling everything into an archive is allowing for economies of scale and increased defensibility regarding data management. This will decrease the risks associated with litigation and compliance, as well as boost the value of companies.

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

Mission Impossible? The eDiscovery Implications of the ABA’s New Ethics Rules

Thursday, August 30th, 2012

The American Bar Association (ABA) recently announced changes to its Model Rules of Professional Conduct that are designed to address digital age challenges associated with practicing law in the 21st century. These changes emphasize that lawyers must understand the ins and outs of technology in order to provide competent representation to their clients. From an eDiscovery perspective, such a declaration is particularly important given the lack of understanding that many lawyers have regarding even the most basic supporting technology needed to effectively satisfy their discovery obligations.

With respect to the actual changes, the amendment to the commentary language from Model Rule 1.1 was most significant for eDiscovery purposes. That rule, which defines a lawyer’s duty of competence, now requires that attorneys discharge that duty with an understanding of the “benefits and risks” of technology:

To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject.

This rule certainly restates the obvious for experienced eDiscovery counsel. Indeed, the Zubulake series of opinions from nearly a decade ago laid the groundwork for establishing that competence and technology are irrevocably and inextricably intertwined. As Judge Scheindlin observed in Zubulake V, “counsel has a duty to effectively communicate to her client its discovery obligations so that all relevant information is discovered, retained, and produced.” This includes being familiar with client retention policies, in addition to its “data retention architecture;” communicating with the “client’s information technology personnel” and arranging for the “segregation and safeguarding of any archival media (e.g., backup tapes) that the party has a duty to preserve.”

Nevertheless, Model Rule 1.1 is groundbreaking in that it formally requires lawyers in those jurisdictions following the Model Rules to be up to speed on the impact of eDiscovery technology. In 2012, that undoubtedly means counsel should become familiar with the benefits and risks of predictive coding technology. With its promise of reduced document review costs and decreased legal fees, counsel should closely examine predictive coding solutions to determine whether they might be deployed in some phase of the document review process (e.g., prioritization, quality assurance for linear review, full scale production). Yet caution should also be exercised given the risks associated with this technology, particularly the well-known limitations of early generation predictive coding tools.

In addition to predictive coding, lawyers would be well served to better understand traditional eDiscovery technology tools such as keyword search, concept search, email threading and data clustering. Indeed, there is significant confusion regarding the continued viability of keyword searching given some prominent judicial opinions frowning on so-called blind keyword searches. However, most eDiscovery jurisprudence and authoritative commentators confirm the effectiveness of keyword searches that involve some combination of testing, sampling and iterative feedback.

Whether the technology involves predictive coding, keyword searching, attorney client privilege reviews or other areas of eDiscovery, the revised Model Rules appear to require counsel to understand the benefits and risks of these tools. Moreover, this is not simply a one-time directive. Because technology is always changing, lawyers should continue to stay abreast of changes and developments. This continuing duty of competence is well summarized in The Sedona Conference Best Practices Commentary on the Use of Search & Retrieval Methods in E-Discovery:

Parties and the courts should be alert to new and evolving search and information retrieval methods. What constitutes a reasonable search and information retrieval method is subject to change, given the rapid evolution of technology. The legal community needs to be vigilant in examining new and emerging techniques and methods which claim to yield better search results.

While the challenge of staying abreast of these complex technological changes is difficult, it is certainly not “mission impossible.” Lawyers untrained in the areas of technology have often developed tremendous skill sets required for dealing with other areas of complexities in the law. Perhaps the wise but encouraging reminder from Anthony Hopkins to Tom Cruise in Mission Impossible II will likewise spur reluctant attorneys to accept this difficult, though not impossible task: “Well this is not Mission Difficult, Mr. Hunt, it’s Mission Impossible. Difficult should be a walk in the park for you.”

Magic 8 Ball Predictions for eDiscovery in Florida: FRCP, FOIA and the Sunshine Laws

Thursday, August 23rd, 2012

The Sunshine State is shining a new ray of light on the information governance and eDiscovery space with new civil procedure laws addressing electronically stored information (ESI). The new rules, which go into effect September 1, 2012, are six years in the making and a product of many iterations and debate amongst practitioners, neutrals and jurists. While they generally mirror the Federal Rules of Civil Procedure (FRCP) and embrace much of Sedona’s Cooperation Proclamation, there are some marked procedural differences that generally accomplish these same goals.

For example, instead of mandating a meet and confer conference (a la the FRCP), the new state rules provide for these negotiations in a case management conference pursuant to Rule 1.200-1.201. None of the Florida rules are a surprise since they wisely promote early discussions regarding potential discovery problems, understanding of information management systems, and competency on the part of lawyers and their clients to effectively address litigation hold practices and preservation – just as the FRCP do.

There are comprehensive blogs that have already covered the nuts and bolts of how the rules change the practice of law in Florida with regard to ESI, as well as a fantastic video featuring Judge Richard Nielsen who piloted these principles in his Florida court. Perhaps the most interesting legal issues facing Florida have to do with the impact of the new rules intersecting with open government and record keeping, and what the burden of the government will be on a go forward basis to produce metadata.

This is not to say the private sector won’t have to make changes as well, because anyone litigating in Florida should take eDiscovery seriously given recent cases like Coquina Investments v. Rothstein. In this case, Judge Marcia Cooke boldly sanctioned the defendant(s) and their lawyers for failing to preserve, search and produce information relevant to the case. One of the issues in the case involved format; paper documents were produced by the defendant when they should have been electronically produced with relevant metadata.

The federal government has had a brush with this nexus, although it remains unresolved. In the NDLON case, Judge Scheindlin initially ordered the government to produce select metadata, but subsequently retracted her ruling. Critics of the initial holding claim she confused the discovery requirements of the FRCP and Freedom of Information Act (FOIA). While these two have different legal standards – (FOIA) reasonable and the (FRCP) proportional – this issue is a red herring.

The differing standards are not the true issue; the ability to conduct a thorough search to retrieve relevant information and produce metadata appropriately is the crux. FOIA is in many cases a more stringent standard than that of the FRCP, and this puts even more pressure on the government to improve their technology. The simple premise documents should be produced in the manner they were created, or alternatively, with all of the characteristics necessary to the merits of a proceeding, is not technologically difficult to attain. Nor is the redaction of sensitive information due to relevance or an exemption.

Florida’s most luminary legal contribution to information governance up until this point has been the most comprehensive body of legislation in the United States addressing the right to information and access to public records (Sunshine Laws). Early on, Florida embraced the concept that information created by the government needs to be accessible to the public, and has adopted policies and technologies to address this responsibility.

Florida has historically been the most transparent of all the states and proactive about clarifying how certain communications (specifically ESI) become public records. In the near future, these laws will further force Florida into becoming the most progressive state with regard to their information management and in-house eDiscovery capabilities. More than the laws being on the books, the sheer number of lawsuits increasingly involving the Sunshine Laws and ESI will be the impetus for much of this technological innovation.

Today we are in the age of information governance, and at the dawn of mainstream predictive coding for litigation. Increasingly, organizations are archiving information and deploying in-house eDiscovery capabilities pursuing the promise of gaining control of data, limiting risk, and deriving value from their data. The fact that civil litigants are suing the government frequently under the FOIA and Sunshine Laws creates a nexus that must and will be resolved in the near future.

The most brilliant part of NDLON’s first ruling regarding metadata was that it spoke to the concept of the FRCP and FOIA being aligned. Both are requests for production, and while they have differing legal standards, it is inefficient to conduct those searches in a different/unrelated manner once an information governance infrastructure has been implemented. When they collide, one has both to contend with and the new rules will bring this issue to resolve. The tools used for a discovery request can and should be the same as those used to comply with a FOIA production – and they should be in place from the start. For a state like Florida, a case involving the Sunshine Laws will consider this question, but now under more ESI-savvy rules. Florida cannot afford to be reinventing the wheel, or scrambling to comply with requests, a proactive infrastructure needs to be in place.

Florida’s new rules will impact all areas of state and local government, as well as educational institutions that are state funded in civil litigation. Questions about format, employee self-collection, retention and litigation hold are going to get very hot in the Sunshine State because the government is more accountable there. As said by Louis Brandeis, “Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.” This may be a rare case of state case law driving federal rulemaking, coupled with a need for technological advancement on the government’s part.

Gartner’s 2012 Magic Quadrant for E-Discovery Software Looks to Information Governance as the Future

Monday, June 18th, 2012

Gartner recently released its 2012 Magic Quadrant for E-Discovery Software, which is its annual report analyzing the state of the electronic discovery industry. Many vendors in the Magic Quadrant (MQ) may initially focus on their position and the juxtaposition of their competitive neighbors along the Visionary – Execution axis. While a very useful exercise, there are also a number of additional nuggets in the MQ, particularly regarding Gartner’s overview of the market, anticipated rates of consolidation and future market direction.

Context

For those of us who’ve been around the eDiscovery industry since its infancy, it’s gratifying to see the electronic discovery industry mature.  As Gartner concludes, the promise of this industry isn’t off in the future, it’s now:

“E-discovery is now a well-established fact in the legal and judicial worlds. … The growth of the e-discovery market is thus inevitable, as is the acceptance of technological assistance, even in professions with long-standing paper traditions.”

The past wasn’t always so rosy, particularly when the market was dominated by hundreds of service providers that seemed to hold on by maintaining a few key relationships, combined with relatively high margins.

“The market was once characterized by many small providers and some large ones, mostly employed indirectly by law firms, rather than directly by corporations. …  Purchasing decisions frequently reflected long-standing trusted relationships, which meant that even a small book of business was profitable to providers and the effects of customary market forces were muted. Providers were able to subsist on one or two large law firms or corporate clients.”

Consolidation

The Magic Quadrant correctly notes that these “salad days” just weren’t feasible long term. Gartner sees the pace of consolidation heating up even further, with some players striking it rich and some going home empty handed.

“We expect that 2012 and 2013 will see many of these providers cease to exist as independent entities for one reason or another — by means of merger or acquisition, or business failure. This is a market in which differentiation is difficult and technology competence, business model rejuvenation or size are now required for survival. … The e-discovery software market is in a phase of high growth, increasing maturity and inevitable consolidation.”

Navigating these treacherous waters isn’t easy for eDiscovery providers, nor is it simple for customers to make purchasing decisions if they’re correctly concerned that the solution they buy today won’t be around tomorrow.  Yet, despite the prognostication of an inevitable shakeout (Gartner forecasts that the market will shrink 25% in the raw number of firms claiming eDiscovery products/services) they are still very bullish about the sector.

“Gartner estimates that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010. The five-year CAGR to 2015 is approximately 16%.”

This certainly means there’s a window of opportunity for certain players – particularly those who help larger players fill out their EDRM suite of offerings, since the best of breed era is quickly going by the wayside.  Gartner notes that end-to-end functionality is now table stakes in the eDiscovery space.

“We have seen a large upsurge in user requests for full-spectrum EDRM functionality. Whether that functionality will be used initially, or at all, remains an open question. Corporate buyers do seem minded to future-proof their investments in this way, by anticipating what they may wish to do with the software and the vendor in the future.”

Information Governance

Not surprisingly, it’s this “full-spectrum” functionality that most closely aligns with marrying the reactive, right side of the EDRM with the proactive, left side.  In concert, this yin and yang is referred to as information governance, and it’s this notion that’s increasingly driving buying behaviors.

“It is clear from our inquiry service that the desire to bring e-discovery under control by bringing data under control with retention management is a strategy that both legal and IT departments pursue in order to control cost and reduce risks. Sometimes the archiving solution precedes the e-discovery solution, and sometimes it follows it, but Gartner clients that feel the most comfortable with their e-discovery processes and most in control of their data are those that have put archiving systems in place …”

As Gartner looks out five years, the analyst firm anticipates more progress on the information governance front, because the “entire e-discovery industry is founded on a pile of largely redundant, outdated and trivial data.”  At some point this digital landfill is going to burst and organizations are finally realizing that if they don’t act now, it may be too late.

“During the past 10 to 15 years, corporations and individuals have allowed this data to accumulate for the simple reason that it was easy — if not necessarily inexpensive — to do so. … E-discovery has proved to be a huge motivation for companies to rethink their information management policies. The problem of determining what is relevant from a mass of information will not be solved quickly, but with a clear business driver (e-discovery) and an undeniable return on investment (deleting data that is no longer required for legal or business purposes can save millions of dollars in storage costs) there is hope for the future.”

 

The Gartner Magic Quadrant for E-Discovery Software is insightful for a number of reasons, not the least of which is how it portrays the developing maturity of the electronic discovery space. In just a few short years, the niche has sprouted wings, raced to $1B and is seeing massive consolidation. As we enter the next phase of maturation, we’ll likely see the sector morph into a larger, information governance play, given customers’ “full-spectrum” functionality requirements and the presence of larger, mainstream software companies.  Next on the horizon is the subsuming of eDiscovery into both the bigger information governance umbrella, as well as other larger adjacent plays like “enterprise information archiving, enterprise content management, enterprise search and content analytics.” The rapid maturation of the eDiscovery industry will inevitably result in growing pains for vendors and practitioners alike, but in the end we’ll all benefit.

 

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Gartner’s “2012 Magic Quadrant for E-Discovery Software” Provides a Useful Roadmap for Legal Technologists

Tuesday, May 29th, 2012

Gartner has just released its 2012 Magic Quadrant for E-Discovery Software, which is an annual report that analyzes the state of the electronic discovery industry and provides a detailed vendor-by-vendor evaluation. For many, particularly those in IT circles, Gartner is an unwavering north star used to divine software market leaders, in topics ranging from business intelligence platforms to wireless lan infrastructures. When IT professionals are on the cusp of procuring complex software, they look to analysts like Gartner for quantifiable and objective recommendations – as a way to inform and buttress their own internal decision making processes.

But for some in the legal technology field (particularly attorneys), looking to Gartner for software analysis can seem a bit foreign. Legal practitioners are often more comfortable with the “good ole days” when the only navigation aid in the eDiscovery world was provided by the dynamic duo of George Socha and Tom Gelbmanm, who (beyond creating the EDRM) were pioneers of the first eDiscovery rankings survey. Albeit somewhat short lived, their Annual Electronic Discovery[i] Survey ranked the hundreds of eDiscovery providers and bucketed the top tier players in both software and litigation support categories. The scope of their mission was grand, and they were perhaps ultimately undone by the breadth of their task (stopping the Survey in 2010), particularly as the eDiscovery landscape continued to mature, fragment and evolve.

Gartner, which has perfected the analysis of emerging software markets, appears to have taken on this challenge with an admittedly more narrow (and likely more achievable) focus. Gartner published its first Magic Quadrant (MQ) for the eDiscovery industry last year, and in the 2012 Magic Quadrant for E-Discovery Software report they’ve evaluated the top 21 electronic discovery software vendors. As with all Gartner MQs, their methodology is rigorous; in order to be included, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon criteria for completeness of vision and ability to execute.

By eliminating the legion of service providers and law firms, Gartner has made their mission both more achievable and perhaps (to some) less relevant. When talking to certain law firms and litigation support providers, some seem to treat the Gartner initiative (and subsequent Magic Quadrant) like a map from a land they never plan to visit. But, even if they’re not directly procuring eDiscovery software, the Gartner MQ should still be seen by legal technologists as an invaluable tool to navigate the perils of the often confusing and shifting eDiscovery landscape – particularly with the rash of recent M&A activity.

Beyond the quadrant positions[ii], comprehensive analysis and secular market trends, one of the key underpinnings of the Magic Quadrant is that the ultimate position of a given provider is in many ways an aggregate measurement of overall customer satisfaction. Similar in ways to the net promoter concept (which is a tool to gauge the loyalty of a firm’s customer relationships simply by asking how likely that customer is to recommend a product/service to a colleague), the Gartner MQ can be looked at as the sum total of all customer experiences.[iii] As such, this usage/satisfaction feedback is relevant even for parties that aren’t purchasing or deploying electronic discovery software per se. Outside counsel, partners, litigation support vendors and other interested parties may all end up interacting with a deployed eDiscovery solution (particularly when such solutions have expanded their reach as end-to-end information governance platforms) and they should want their chosen solution to used happily and seamlessly in a given enterprise. There’s no shortage of stories about unhappy outside counsel (for example) that complain about being hamstrung by a slow, first generation eDiscovery solution that ultimately makes their job harder (and riskier).

Next, the Gartner MQ also is a good short-handed way to understand more nuanced topics like time to value and total cost of ownership. While of course related to overall satisfaction, the Magic Quadrant does indirectly address the query about whether the software does what it says it will (delivering on the promise) in the time frame that is claimed (delivering the promise in a reasonable time frame) since these elements are typically subsumed in the satisfaction metric. This kind of detail is disclosed in the numerous interviews that Gartner conducts to go behind the scenes, querying usage and overall satisfaction.

While no navigation aid ensures that a traveler won’t get lost, the Gartner Magic Quadrant for E-Discovery Software is a useful map of the electronic discovery software world. And, particularly looking at year-over-year trends, the MQ provides a useful way for legal practitioners (beyond the typical IT users) to get a sense of the electronic discovery market landscape as it evolves and matures. After all, staying on top of the eDiscovery industry has a range of benefits beyond just software procurement.

Please register here to access the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



[i] Note, in the good ole days folks still used two words to describe eDiscovery.

[ii] Gartner has a proprietary matrix that it uses to place the entities into four quadrants: Leaders, Challengers, Visionaries and Niche Players.

[iii] Under the Ability to Execute axis Gartner weighs a number of factors including “Customer Experience: Relationships, products and services or programs that enable clients to succeed with the products evaluated. Specifically, this criterion includes implementation experience, and the ways customers receive technical support or account support. It can also include ancillary tools, the existence and quality of customer support programs, availability of user groups, service-level agreements and so on.”

District Court Upholds Judge Peck’s Predictive Coding Order Over Plaintiff’s Objection

Monday, April 30th, 2012

In a decision that advances the predictive coding ball one step further, United States District Judge Andrew L. Carter, Jr. upheld Magistrate Judge Andrew Peck’s order in Da Silva Moore, et. al. v. Publicis Groupe, et. al. despite Plaintiff’s multiple objections. Although Judge Carter rejected all of Plaintiff’s arguments in favor of overturning Judge Peck’s predictive coding order, he did not rule on Plaintiff’s motion to recuse Judge Peck from the current proceedings – a matter that is expected to be addressed separately at a later time. Whether or not a successful recusal motion will alter this or any other rulings in the case remains to be seen.

Finding that it was within Judge Peck’s discretion to conclude that the use of predictive coding technology was appropriate “under the circumstances of this particular case,” Judge Carter summarized Plaintiff’s key arguments listed below and rejected each of them in his five-page Opinion and Order issued on April 26, 2012.

  • the predictive coding method contemplated in the ESI protocol lacks generally accepted reliability standards,
  • Judge Peck improperly relied on outside documentary evidence,
  • Defendant MSLGroup’s (“MSL’s”) expert is biased because the use of predictive coding will reap financial benefits for his company,
  • Judge Peck failed to hold an evidentiary hearing and adopted MSL’s version of the ESI protocol on an insufficient record and without proper Rule 702 consideration

Since Judge Peck’s earlier order is “non-dispositive,” Judge Carter identified and applied the “clearly erroneous or contrary to law” standard of review in rejecting Plaintiffs’ request to overturn the order. Central to Judge Carter’s reasoning is his assertion that any confusion regarding the ESI protocol is immaterial because the protocol “contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs.” In other words, Judge Carter essentially dismisses Plaintiff’s concerns as premature on the grounds that the current protocol provides a system of checks and balances that protects both parties. To be clear, that doesn’t necessarily mean Plaintiffs won’t get a second bite of the apple if problems with MSL’s productions surface.

For now, however, Judge Carter seems to be saying that although Plaintiffs must live with the current order, they are by no means relinquishing their rights to a fair and just discovery process. In fact, the existing protocol allows Plaintiffs to actively participate in and monitor the entire process closely. For example, Judge Carter writes that, “if the predictive coding software is flawed or if Plaintiffs are not receiving the types of documents that should be produced, the parties are allowed to reconsider their methods and raise their concerns with the Magistrate Judge.”

Judge Carter also specifically addresses Plaintiff’s concerns related to statistical sampling techniques which could ultimately prove to be their meatiest argument. A key area of disagreement between the parties is whether or not MSL is reviewing enough documents to insure relevant documents are not completely overlooked even if this complex process is executed flawlessly. Addressing this point Judge Carter states that, “If the method provided in the protocol does not work or if the sample size is indeed too small to properly apply the technology, the Court will not preclude Plaintiffs from receiving relevant information, but to call the method unreliable at this stage is speculative.”

Although most practitioners are focused on seeing whether and how many of these novel predictive coding issues play out, it is important not to overlook two key nuggets of information lining Judge Carter’s Opinion and Order. First, Judge Carter’s statement that “[t]here simply is no review tool that guarantees perfection” serves as an acknowledgement that “reasonableness” is the standard by which discovery should be measured, not “perfection.” Second, Judge Carter’s acknowledgement that manual review with keyword searches may be appropriate in certain situations should serve as a wake-up call for those who think predictive coding technology will replace all predecessor technologies. To the contrary, predictive coding is a promising new tool to add to the litigator’s tool belt, but it is not necessarily a replacement for all other technology tools.

Plaintiffs in Da Silva Moore may not have received the ruling they were hoping for, but Judge Carter’s Opinion and Order makes it clear that the court house door has not been closed. Given the controversy surrounding this case, one can assume that Plaintiffs are likely to voice many of their concerns at a later date as discovery proceeds. In other words, don’t expect all of these issues to fade away without a fight.

First State Court Issues Order Approving the Use of Predictive Coding

Thursday, April 26th, 2012

On Monday, Virginia Circuit Court Judge James H. Chamblin issued what appears to be the first state court Order approving the use of predictive coding technology for eDiscovery. Tuesday, Law Technology News reported that Judge Chamblin issued the two-page Order in Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center, et al, over Plaintiffs’ objection that traditional manual review would yield more accurate results. The case stems from the collapse of three hangars at the Dulles Jet Center (“DJC”) that occurred during a major snow storm on February 6, 2010. The Order was issued at Defendants’ request after opposing counsel objected to their proposed use of predictive coding technology to “retrieve potentially relevant documents from a massive collection of electronically stored information.”

In Defendants’ Memorandum in Support of their motion, they argue that a first pass manual review of approximately two million documents would cost two million dollars and only locate about sixty percent of all potentially responsive documents. They go on to state that keyword searching might be more cost-effective “but likely would retrieve only twenty percent of the potentially relevant documents.” On the other hand, they claim predictive coding “is capable of locating upwards of seventy-five percent of the potentially relevant documents and can be effectively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching.”

In their Opposition Brief, Plaintiffs argue that Defendants should produce “all responsive documents located upon a reasonable inquiry,” and “not just the 75%, or less, that the ‘predictive coding’ computer program might select.” They also characterize Defendants’ request to use predictive coding technology instead of manual review as a “radical departure from the standard practice of human review” and point out that Defendants cite no case in which a court compelled a party to accept a document production selected by a “’predictive coding’ computer program.”

Considering predictive coding technology is new to eDiscovery and first generation tools can be difficult to use, it is not surprising that both parties appear to frame some of their arguments curiously. For example, Plaintiffs either mischaracterize or misunderstand Defendants’ proposed workflow given their statement that Defendants want a “computer program to make the selections for them” instead of having “human beings look at and select documents.” Importantly, predictive coding tools require human input for a computer program to “predict” document relevance. Additionally, the proposed approach includes an additional human review step prior to production that involves evaluating the computer’s predictions.

On the other hand, some of Defendants’ arguments also seem to stray a bit off course. For example, Defendants’ seem to unduly minimize the value of using other tools in the litigator’s tool belt like keyword search or topic grouping to cull data prior to using potentially more expensive predictive coding technology. To broadly state that keyword searching “likely would retrieve only twenty percent of the potentially relevant documents” seems to ignore two facts. First, keyword search for eDiscovery is not dead. To the contrary, keyword searches can be an effective tool for broadly culling data prior to manual review and for conducting early case assessments. Second, the success of keyword searches and other litigation tools depends as much on the end user as the technology. In other words, the carpenter is just as important as the hammer.

The Order issued by Judge Chamblin, the current Chief Judge for the 20th Judicial Circuit of Virginia, states that “Defendants shall be allowed to proceed with the use of predictive coding for purposes of the processing and production of electronically stored information.”  In a hand written notation, the Order further provides that the processing and production is to be completed within 120 days, with “processing” to be completed within 60 days and “production to follow as soon as practicable and in no more than 60 days.” The order does not mention whether or not the parties are required to agree upon a mutually agreeable protocol; an issue that has plagued the court and the parties in the ongoing Da Silva Moore, et. al. v. Publicis Groupe, et. al. for months.

Global Aerospace is the third known predictive coding case on record, but appears to present yet another set of unique legal and factual issues. In Da Silva Moore, Judge Andrew Peck of the Southern District of New York rang in the New Year by issuing the first known court order endorsing the use of predictive coding technology.  In that case, the parties agreed to the use of predictive coding technology, but continue to fight like cats and dogs to establish a mutually agreeable protocol.

Similarly, in the 7th Federal Circuit, Judge Nan Nolan is tackling the issue of predictive coding technology in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. In Kleen, Plaintiffs basically ask that Judge Nolan order Defendants to redo their production even though Defendants have spent thousands of hours reviewing documents, have already produced over a million documents, and their review is over 99 percent complete. The parties have already presented witness testimony in support of their respective positions over the course of two full days and more testimony may be required before Judge Nolan issues a ruling.

What is interesting about Global Aerospace is that Defendants proactively sought court approval to use predictive coding technology over Plaintiffs’ objections. This scenario is different than Da Silva Moore because the parties in Global Aerospace have not agreed to the use of predictive coding technology. Similarly, it appears that Defendants have not already significantly completed document review and production as they had in Kleen Products. Instead, the Global Aerospace Defendants appear to have sought protection from the court before moving full steam ahead with predictive coding technology and they have received the court’s blessing over Plaintiffs’ objection.

A key issue that the Order does not address is whether or not the parties will be required to decide on a mutually agreeable protocol before proceeding with the use of predictive coding technology. As stated earlier, the inability to define a mutually agreeable protocol is a key issue that has plagued the court and the parties for months in Da Silva Moore, et. al. v. Publicis Groupe, et. al. Similarly, in Kleen, the court was faced with issues related to the protocol for using technology tools. Both cases highlight the fact that regardless of which eDiscovery technology tools are selected from the litigator’s tool belt, the tools must be used properly in order for discovery to be fair.

Judge Chamblin left the barn door wide open for Plaintiffs to lodge future objections, perhaps setting the stage for yet another heated predictive coding battle. Importantly, the Judge issued the Order “without prejudice to a receiving party” and notes that parties can object to the “completeness or the contents of the production or the ongoing use of predictive coding technology.”  Given the ongoing challenges in Da Silva Moore and Kleen, don’t be surprised if the parties in Global Aerospace Inc. face some of the same process-based challenges as their predecessors. Hopefully some of the early challenges related to the use of first generation predictive coding tools can be overcome as case law continues to develop and as next generation predictive coding tools become easier to use. Stay tuned as the facts, testimony, and arguments related to Da Silva Moore, Kleen Products, and Global Aerospace Inc. cases continue to evolve.