24h-payday

Archive for the ‘production’ Category

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

Clean Sweep in Kleen Products Predictive Coding Battle? Not Exactly

Friday, August 24th, 2012

The tears of sadness shed by those in the eDiscovery community lamenting the end of the predictive coding debate in Kleen Products may turn to tears of joy when they realize that the debate could resurface next year. Despite early reports, the Plaintiffs in Kleen did not completely roll over on their argument that defendants should be required to use what they characterize as “Content Based Advanced Analytics” (“CBAA”). To the contrary, Plaintiffs preserved their right to meet and confer with Defendants about future document productions after October 1, 2013. Not surprisingly, future document productions could rekindle the fiery debate about the use of predictive coding technology.

The controversy surrounding Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. was sparked earlier this year when Plaintiffs asked Judge Nolan to order Defendants to redo their previous productions and all future productions using CBAA. Among other things, Plaintiffs claimed that if Defendants had used “CBAA” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. In June, I reported hearing transcripts indicated that 7th Circuit Magistrate Judge Nan Nolan was urging the parties to focus on developing a mutually agreeable keyword approach to eDiscovery instead of debating whether other search and review methodologies would yield better results. This nudging by Judge Nolan was not surprising considering at least some of the defendants had already spent considerable time and money managing the document production process using more traditional tools other than predictive coding.

In a new twist, reports from other sources surfaced recently, suggesting that the Plaintiffs in Kleen decided to completely withdraw their demands that Defendants use predictive coding during discovery. The news likely disappointed many in the electronic discovery space poised to witness a third round of expert testimony pitting more traditional eDiscovery approaches against predictive coding technology. However, any such disappointment is premature because those dreaming of an eDiscovery showdown in Kleen could still see their dreams come true next year.

On August 21, Judge Nolan did indeed sign a joint “Stipulation and Order Relating to ESI Search.” However, in the order the Plaintiffs withdrew “their demand that defendants apply CBAA to documents contained in the First Request Corpus (emphasis added).” Plaintiffs go on to stipulate that they will not “argue or contend that defendants should be required to use or apply the types of CBAA or “predictive coding” methodology… with respect to any requests for production served on any defendant prior to October 1, 2013 (emphasis added).” Importantly, the Plaintiffs preserved their right to meet and confer regarding the appropriate search methodology to be used for future collections if discovery continues past October of next year.

Considering the parties have only scratched the surface of discovery thus far, the likelihood that the predictive coding issue will resurface again is high unless settlement is reached or Defendants have a change of heart. In short, the door is still wide open for Plaintiffs to argue that Defendants should be required to use predictive coding technology to manage future productions, and rumors about the complete demise of predictive coding in the Kleen Products case have been exaggerated.

Magic 8 Ball Predictions for eDiscovery in Florida: FRCP, FOIA and the Sunshine Laws

Thursday, August 23rd, 2012

The Sunshine State is shining a new ray of light on the information governance and eDiscovery space with new civil procedure laws addressing electronically stored information (ESI). The new rules, which go into effect September 1, 2012, are six years in the making and a product of many iterations and debate amongst practitioners, neutrals and jurists. While they generally mirror the Federal Rules of Civil Procedure (FRCP) and embrace much of Sedona’s Cooperation Proclamation, there are some marked procedural differences that generally accomplish these same goals.

For example, instead of mandating a meet and confer conference (a la the FRCP), the new state rules provide for these negotiations in a case management conference pursuant to Rule 1.200-1.201. None of the Florida rules are a surprise since they wisely promote early discussions regarding potential discovery problems, understanding of information management systems, and competency on the part of lawyers and their clients to effectively address litigation hold practices and preservation – just as the FRCP do.

There are comprehensive blogs that have already covered the nuts and bolts of how the rules change the practice of law in Florida with regard to ESI, as well as a fantastic video featuring Judge Richard Nielsen who piloted these principles in his Florida court. Perhaps the most interesting legal issues facing Florida have to do with the impact of the new rules intersecting with open government and record keeping, and what the burden of the government will be on a go forward basis to produce metadata.

This is not to say the private sector won’t have to make changes as well, because anyone litigating in Florida should take eDiscovery seriously given recent cases like Coquina Investments v. Rothstein. In this case, Judge Marcia Cooke boldly sanctioned the defendant(s) and their lawyers for failing to preserve, search and produce information relevant to the case. One of the issues in the case involved format; paper documents were produced by the defendant when they should have been electronically produced with relevant metadata.

The federal government has had a brush with this nexus, although it remains unresolved. In the NDLON case, Judge Scheindlin initially ordered the government to produce select metadata, but subsequently retracted her ruling. Critics of the initial holding claim she confused the discovery requirements of the FRCP and Freedom of Information Act (FOIA). While these two have different legal standards – (FOIA) reasonable and the (FRCP) proportional – this issue is a red herring.

The differing standards are not the true issue; the ability to conduct a thorough search to retrieve relevant information and produce metadata appropriately is the crux. FOIA is in many cases a more stringent standard than that of the FRCP, and this puts even more pressure on the government to improve their technology. The simple premise documents should be produced in the manner they were created, or alternatively, with all of the characteristics necessary to the merits of a proceeding, is not technologically difficult to attain. Nor is the redaction of sensitive information due to relevance or an exemption.

Florida’s most luminary legal contribution to information governance up until this point has been the most comprehensive body of legislation in the United States addressing the right to information and access to public records (Sunshine Laws). Early on, Florida embraced the concept that information created by the government needs to be accessible to the public, and has adopted policies and technologies to address this responsibility.

Florida has historically been the most transparent of all the states and proactive about clarifying how certain communications (specifically ESI) become public records. In the near future, these laws will further force Florida into becoming the most progressive state with regard to their information management and in-house eDiscovery capabilities. More than the laws being on the books, the sheer number of lawsuits increasingly involving the Sunshine Laws and ESI will be the impetus for much of this technological innovation.

Today we are in the age of information governance, and at the dawn of mainstream predictive coding for litigation. Increasingly, organizations are archiving information and deploying in-house eDiscovery capabilities pursuing the promise of gaining control of data, limiting risk, and deriving value from their data. The fact that civil litigants are suing the government frequently under the FOIA and Sunshine Laws creates a nexus that must and will be resolved in the near future.

The most brilliant part of NDLON’s first ruling regarding metadata was that it spoke to the concept of the FRCP and FOIA being aligned. Both are requests for production, and while they have differing legal standards, it is inefficient to conduct those searches in a different/unrelated manner once an information governance infrastructure has been implemented. When they collide, one has both to contend with and the new rules will bring this issue to resolve. The tools used for a discovery request can and should be the same as those used to comply with a FOIA production – and they should be in place from the start. For a state like Florida, a case involving the Sunshine Laws will consider this question, but now under more ESI-savvy rules. Florida cannot afford to be reinventing the wheel, or scrambling to comply with requests, a proactive infrastructure needs to be in place.

Florida’s new rules will impact all areas of state and local government, as well as educational institutions that are state funded in civil litigation. Questions about format, employee self-collection, retention and litigation hold are going to get very hot in the Sunshine State because the government is more accountable there. As said by Louis Brandeis, “Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.” This may be a rare case of state case law driving federal rulemaking, coupled with a need for technological advancement on the government’s part.

Addressing the Challenges of Cross-Border Data Protection and eDiscovery

Friday, August 17th, 2012

One of the more troubling eDiscovery issues that globalization has inadvertently imposed on organizations is compliance with a complex set of international data protection and privacy laws. These laws present a significant challenge to U.S. companies, which enjoy fewer domestic restraints on collecting and storing personal data of its employees and consumers.

It’s not that these laws are unfamiliar concepts to U.S. corporations. Contrary to popular belief, statutes and regulations do exist in the U.S. to help protect certain personal and financial information from unauthorized disclosure. Nevertheless, the U.S. approach to data protection is mostly patchwork and is unmatched by the comprehensive framework in other regions, particularly in Europe.

Data Protection in Europe

The data protection regime adopted by the European Union (EU) presents unique information governance challenges to even the most sophisticated organizations. Developed to address the abuses of twentieth century fascism and communism, the EU system emphasizes the importance of securing personal information from unreasonable government and corporate intrusions. To guard against such intrusions, the EU member states have enacted laws that curtail processing, collection and storage of personal data. For example, European laws generally prevent organizations from processing personal information unless it is done for a lawful purpose and is not excessive. Furthermore, personal data may not be maintained longer than is necessary and must be properly secured.

Beyond these basic data protection principles, certain countries in Europe provide additional safeguards. In Germany, for instance, state governments have implemented their own data privacy provisions that are exclusive of and, in the case of the German state of Schleswig-Holstein, more exacting than the larger EU protection framework. Furthermore, corporate data processing in Germany must satisfy company Works Councils, which represent the interests of employees and protect their privacy rights.

The Clash between Data Protection Laws and Litigation Discovery Rules

A significant area of complexity facing organizations with respect to the governance of personal information concerns the treatment of that data in European and cross-border litigation. In domestic European litigation, personal data could be subject to discovery if it supports the claims of the parties or a court orders its disclosure. That could place an organization in the tricky position of having to produce personal data that may very well be protected by privacy laws. While legal exceptions do exist for these situations, the person whose data is subject to disclosure may nonetheless seek to prevent its dissemination on privacy grounds. Furthermore, company Works Councils and Data Protection Officers may object to these disclosures.

Additional difficulty may arise when addressing international discovery requests that seek personal information. Companies whose European offices receive these requests must ensure that the country where the data will be transferred has enacted laws that meet EU data protection standards. Transfers of personal data to countries that do not meet those standards are generally forbidden, with fines and even prison time imposed for non-compliance.

Certain countries have more stringent rules regarding proposed transfers of personal information. In France, for example, international discovery requests that seek personal data must comply with the rules promulgated by the French data protection authority, La Commission Nationale de l’Informatique et des Libertès (CNIL). Those rules require that the CNIL and the data subjects be notified regarding the proposed data transfer. In addition, disclosures must be limited to relevant information, with appropriate redactions of data that could be used to identify the data subjects.

Additional complications may arise for enterprises whose European offices have been served with discovery requests from the U.S. Despite the restrictions imposed by European data protection authorities and the penalties for noncompliance, organizations are often compelled by U.S. courts to produce personal information without regard to these laws. Noncompliance could subject organizations to U.S. court sanctions or, on the other hand, fines and possibly even jail time under European data protection laws.

Using Information Governance to Solve the Data Protection Conundrum

Given the complexity of ensuring conformity with foreign privacy rules and the penalties for noncompliance, organizations should consider developing an information governance strategy to effectively address these issues. Such an approach will typically require the data management principals (legal and IT) to work together on the myriad of legal and logistical issues surrounding information retention.

Legal and IT should also develop a process for how the organization will address data preservation and production during litigation. Where applicable, Works Councils and Data Protection Officers should be involved in the process to ensure that data protection laws are properly observed and employee privacy rights are safeguarded.

An effective governance strategy should also incorporate effective, enabling technologies to meet company information management goals while observing data protection laws. Archiving software, data loss prevention functionality and eDiscovery tools are all examples of technologies that together provide the means to protect personal information processed in connection with an organization’s information governance strategy.

By following these steps, organizations will be better prepared for the challenges of addressing cross-border data protection laws and the legal traps that are inextricably intertwined with globalization.

Gartner’s 2012 Magic Quadrant for E-Discovery Software Looks to Information Governance as the Future

Monday, June 18th, 2012

Gartner recently released its 2012 Magic Quadrant for E-Discovery Software, which is its annual report analyzing the state of the electronic discovery industry. Many vendors in the Magic Quadrant (MQ) may initially focus on their position and the juxtaposition of their competitive neighbors along the Visionary – Execution axis. While a very useful exercise, there are also a number of additional nuggets in the MQ, particularly regarding Gartner’s overview of the market, anticipated rates of consolidation and future market direction.

Context

For those of us who’ve been around the eDiscovery industry since its infancy, it’s gratifying to see the electronic discovery industry mature.  As Gartner concludes, the promise of this industry isn’t off in the future, it’s now:

“E-discovery is now a well-established fact in the legal and judicial worlds. … The growth of the e-discovery market is thus inevitable, as is the acceptance of technological assistance, even in professions with long-standing paper traditions.”

The past wasn’t always so rosy, particularly when the market was dominated by hundreds of service providers that seemed to hold on by maintaining a few key relationships, combined with relatively high margins.

“The market was once characterized by many small providers and some large ones, mostly employed indirectly by law firms, rather than directly by corporations. …  Purchasing decisions frequently reflected long-standing trusted relationships, which meant that even a small book of business was profitable to providers and the effects of customary market forces were muted. Providers were able to subsist on one or two large law firms or corporate clients.”

Consolidation

The Magic Quadrant correctly notes that these “salad days” just weren’t feasible long term. Gartner sees the pace of consolidation heating up even further, with some players striking it rich and some going home empty handed.

“We expect that 2012 and 2013 will see many of these providers cease to exist as independent entities for one reason or another — by means of merger or acquisition, or business failure. This is a market in which differentiation is difficult and technology competence, business model rejuvenation or size are now required for survival. … The e-discovery software market is in a phase of high growth, increasing maturity and inevitable consolidation.”

Navigating these treacherous waters isn’t easy for eDiscovery providers, nor is it simple for customers to make purchasing decisions if they’re correctly concerned that the solution they buy today won’t be around tomorrow.  Yet, despite the prognostication of an inevitable shakeout (Gartner forecasts that the market will shrink 25% in the raw number of firms claiming eDiscovery products/services) they are still very bullish about the sector.

“Gartner estimates that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010. The five-year CAGR to 2015 is approximately 16%.”

This certainly means there’s a window of opportunity for certain players – particularly those who help larger players fill out their EDRM suite of offerings, since the best of breed era is quickly going by the wayside.  Gartner notes that end-to-end functionality is now table stakes in the eDiscovery space.

“We have seen a large upsurge in user requests for full-spectrum EDRM functionality. Whether that functionality will be used initially, or at all, remains an open question. Corporate buyers do seem minded to future-proof their investments in this way, by anticipating what they may wish to do with the software and the vendor in the future.”

Information Governance

Not surprisingly, it’s this “full-spectrum” functionality that most closely aligns with marrying the reactive, right side of the EDRM with the proactive, left side.  In concert, this yin and yang is referred to as information governance, and it’s this notion that’s increasingly driving buying behaviors.

“It is clear from our inquiry service that the desire to bring e-discovery under control by bringing data under control with retention management is a strategy that both legal and IT departments pursue in order to control cost and reduce risks. Sometimes the archiving solution precedes the e-discovery solution, and sometimes it follows it, but Gartner clients that feel the most comfortable with their e-discovery processes and most in control of their data are those that have put archiving systems in place …”

As Gartner looks out five years, the analyst firm anticipates more progress on the information governance front, because the “entire e-discovery industry is founded on a pile of largely redundant, outdated and trivial data.”  At some point this digital landfill is going to burst and organizations are finally realizing that if they don’t act now, it may be too late.

“During the past 10 to 15 years, corporations and individuals have allowed this data to accumulate for the simple reason that it was easy — if not necessarily inexpensive — to do so. … E-discovery has proved to be a huge motivation for companies to rethink their information management policies. The problem of determining what is relevant from a mass of information will not be solved quickly, but with a clear business driver (e-discovery) and an undeniable return on investment (deleting data that is no longer required for legal or business purposes can save millions of dollars in storage costs) there is hope for the future.”

 

The Gartner Magic Quadrant for E-Discovery Software is insightful for a number of reasons, not the least of which is how it portrays the developing maturity of the electronic discovery space. In just a few short years, the niche has sprouted wings, raced to $1B and is seeing massive consolidation. As we enter the next phase of maturation, we’ll likely see the sector morph into a larger, information governance play, given customers’ “full-spectrum” functionality requirements and the presence of larger, mainstream software companies.  Next on the horizon is the subsuming of eDiscovery into both the bigger information governance umbrella, as well as other larger adjacent plays like “enterprise information archiving, enterprise content management, enterprise search and content analytics.” The rapid maturation of the eDiscovery industry will inevitably result in growing pains for vendors and practitioners alike, but in the end we’ll all benefit.

 

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Kleen Products Predictive Coding Update – Judge Nolan: “I am a believer of principle 6 of Sedona”

Tuesday, June 5th, 2012

Recent transcripts reveal that 7th Circuit Magistrate Judge Nan Nolan has urged the parties in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. to focus on developing a mutually agreeable keyword search strategy for eDiscovery instead of debating whether other search and review methodologies would yield better results. This is big news for litigators and others in the electronic discovery space because many perceived Kleen Products as potentially putting keyword search technology on trial, compared to newer technology like predictive coding. Considering keyword search technology is still widely used in eDiscovery, a ruling by Judge Nolan requiring defendants to redo part of their production using technology other than keyword searches would sound alarm bells for many litigators.

The controversy surrounding Kleen Products relates both to Plaintiffs’ position, as well as the status of discovery in the case. Plaintiffs initially asked Judge Nolan to order Defendants to redo their previous productions and all future productions using alternative technology.  The request was surprising to many observers because some Defendants had already spent thousands of hours reviewing and producing in excess of one million documents. That number has since surpassed three million documents.  Among other things, Plaintiffs claim that if Defendants had used “Content Based Advanced Analytics” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. Notably, Plaintiffs do not appear to point to any instances of specific documents missing from Defendants’ productions.

In response, Defendants countered that their use of keyword search technology and their eDiscovery methodology in general was extremely rigorous and thorough. More specifically, they highlight their use of advanced culling and analysis tools (such as domain filtering and email threading) in addition to keyword search tools.  Plaintiffs also claim they cooperated with Defendants by allowing them to participate in the selection of keywords used to search for relevant documents.  Perhaps going above and beyond the eDiscovery norm, the Defendants even instituted a detailed document sampling approach designed to measure the quality of their document productions.

Following two full days of expert witness testimony regarding the adequacy of Plaintiffs’ initial productions, Judge Nolan finally asked the parties to try and reach compromise on the “Boolean” keyword approach.  She apparently reasoned that having the parties work out a mutually agreeable approach based on what Defendants had already implemented was preferable to scheduling yet another full day of expert testimony — even though additional expert testimony is still an option.

In a nod to the Sedona Principles, she further explained her rationale on March 28, 2012, at the conclusion of the second day of testimony:

“the defendants had done a lot of work, the defendant under Sedona 6 has the right to pick the [eDiscovery] method. Now, we all know, every court in the country has used Boolean search, I mean, this is not like some freak thing that they [Defendants] picked out…”

Judge Nolan’s reliance on the Sedona Best Practices Recommendations & Principles for Addressing Electronic Document Production reveals how she would likely rule if Plaintiffs renew their position that Defendants should have used predictive coding or some other kind of technology in lieu of keyword searches. Sedona Principle 6 states that:

“[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

In other words, Judge Nolan confirmed that in her court, opposing parties typically may not dictate what technology solutions their opponents must use without some indication that the technology or process used failed to yield accurate results. Judge Nolan also observed that quality and accuracy are key guideposts regardless of the technology utilized during the eDiscovery process:

“what I was learning from the two days, and this is something no other court in the country has really done too, is how important it is to have quality search. I mean, if we want to use the term “quality” or “accurate,” but we all want this…– how do you verify the work that you have done already, is the way I put it.”

Although Plaintiffs have reserved their right to reintroduce their technology arguments, recent transcripts suggest that Defendants will not be required to use different technology. Plaintiffs continue to meet and confer with individual Defendants to agree on keyword searches, as well as the types of data sources that must be included in the collection. The parties and Judge also appear to agree that they would like to continue making progress with 30(b)(6) depositions and other eDiscovery issues before Judge Nolan retires in a few months, rather than begin a third day of expert hearings regarding technology related issues. This appears to be good news for the Judge and the parties since the eDiscovery issues now seem to be headed in the right direction as a result of mutual cooperation between the parties and some nudging by Judge Nolan.

There is also good news for outside observers in that Judge Nolan has provided some sage guidance to help future litigants before she steps down from the bench. For example, it is clear that Judge Nolan and other judges continue to emphasize the importance of cooperation in today’s complex new world of technology. Parties should be prepared to cooperate and be more transparent during discovery given the judiciary’s increased reliance on the Sedona Cooperation Proclamation. Second, Kleen Products illustrates that keyword search is not dead. Instead, keyword search should be viewed as one of many tools in the Litigator’s Toolbelt™ that can be used with other tools such as email threading, advanced filtering technology, and even predictive coding tools.  Finally, litigators should take note that regardless of the tools they select, they must be prepared to defend their process and use of those tools or risk the scrutiny of judges and opposing parties.

Gartner’s “2012 Magic Quadrant for E-Discovery Software” Provides a Useful Roadmap for Legal Technologists

Tuesday, May 29th, 2012

Gartner has just released its 2012 Magic Quadrant for E-Discovery Software, which is an annual report that analyzes the state of the electronic discovery industry and provides a detailed vendor-by-vendor evaluation. For many, particularly those in IT circles, Gartner is an unwavering north star used to divine software market leaders, in topics ranging from business intelligence platforms to wireless lan infrastructures. When IT professionals are on the cusp of procuring complex software, they look to analysts like Gartner for quantifiable and objective recommendations – as a way to inform and buttress their own internal decision making processes.

But for some in the legal technology field (particularly attorneys), looking to Gartner for software analysis can seem a bit foreign. Legal practitioners are often more comfortable with the “good ole days” when the only navigation aid in the eDiscovery world was provided by the dynamic duo of George Socha and Tom Gelbmanm, who (beyond creating the EDRM) were pioneers of the first eDiscovery rankings survey. Albeit somewhat short lived, their Annual Electronic Discovery[i] Survey ranked the hundreds of eDiscovery providers and bucketed the top tier players in both software and litigation support categories. The scope of their mission was grand, and they were perhaps ultimately undone by the breadth of their task (stopping the Survey in 2010), particularly as the eDiscovery landscape continued to mature, fragment and evolve.

Gartner, which has perfected the analysis of emerging software markets, appears to have taken on this challenge with an admittedly more narrow (and likely more achievable) focus. Gartner published its first Magic Quadrant (MQ) for the eDiscovery industry last year, and in the 2012 Magic Quadrant for E-Discovery Software report they’ve evaluated the top 21 electronic discovery software vendors. As with all Gartner MQs, their methodology is rigorous; in order to be included, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon criteria for completeness of vision and ability to execute.

By eliminating the legion of service providers and law firms, Gartner has made their mission both more achievable and perhaps (to some) less relevant. When talking to certain law firms and litigation support providers, some seem to treat the Gartner initiative (and subsequent Magic Quadrant) like a map from a land they never plan to visit. But, even if they’re not directly procuring eDiscovery software, the Gartner MQ should still be seen by legal technologists as an invaluable tool to navigate the perils of the often confusing and shifting eDiscovery landscape – particularly with the rash of recent M&A activity.

Beyond the quadrant positions[ii], comprehensive analysis and secular market trends, one of the key underpinnings of the Magic Quadrant is that the ultimate position of a given provider is in many ways an aggregate measurement of overall customer satisfaction. Similar in ways to the net promoter concept (which is a tool to gauge the loyalty of a firm’s customer relationships simply by asking how likely that customer is to recommend a product/service to a colleague), the Gartner MQ can be looked at as the sum total of all customer experiences.[iii] As such, this usage/satisfaction feedback is relevant even for parties that aren’t purchasing or deploying electronic discovery software per se. Outside counsel, partners, litigation support vendors and other interested parties may all end up interacting with a deployed eDiscovery solution (particularly when such solutions have expanded their reach as end-to-end information governance platforms) and they should want their chosen solution to used happily and seamlessly in a given enterprise. There’s no shortage of stories about unhappy outside counsel (for example) that complain about being hamstrung by a slow, first generation eDiscovery solution that ultimately makes their job harder (and riskier).

Next, the Gartner MQ also is a good short-handed way to understand more nuanced topics like time to value and total cost of ownership. While of course related to overall satisfaction, the Magic Quadrant does indirectly address the query about whether the software does what it says it will (delivering on the promise) in the time frame that is claimed (delivering the promise in a reasonable time frame) since these elements are typically subsumed in the satisfaction metric. This kind of detail is disclosed in the numerous interviews that Gartner conducts to go behind the scenes, querying usage and overall satisfaction.

While no navigation aid ensures that a traveler won’t get lost, the Gartner Magic Quadrant for E-Discovery Software is a useful map of the electronic discovery software world. And, particularly looking at year-over-year trends, the MQ provides a useful way for legal practitioners (beyond the typical IT users) to get a sense of the electronic discovery market landscape as it evolves and matures. After all, staying on top of the eDiscovery industry has a range of benefits beyond just software procurement.

Please register here to access the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



[i] Note, in the good ole days folks still used two words to describe eDiscovery.

[ii] Gartner has a proprietary matrix that it uses to place the entities into four quadrants: Leaders, Challengers, Visionaries and Niche Players.

[iii] Under the Ability to Execute axis Gartner weighs a number of factors including “Customer Experience: Relationships, products and services or programs that enable clients to succeed with the products evaluated. Specifically, this criterion includes implementation experience, and the ways customers receive technical support or account support. It can also include ancillary tools, the existence and quality of customer support programs, availability of user groups, service-level agreements and so on.”

7th Circuit eDiscovery Pilot Program Tackles Technology Assisted Review With Mock Arguments

Tuesday, May 22nd, 2012

The 7th Circuit eDiscovery Pilot Program’s Mock Argument is the first of its kind and is slated for June 14, 2012.  It is not surprising that the Seventh Circuit’s eDiscovery Pilot Program would be the first to host an event like this on predictive coding, as the program has been a progressive model across the country for eDiscovery protocols since 2009.  The predictive coding event is open to the public (registration required) and showcases the expertise of leading litigators, technologists and experts from all over the United States.  Speakers include: Jason R. Baron, Director of Litigation at the National Archives and Records Administration; Maura R. Grossman, Counsel at Wachtell, Lipton, Rosen & Katz; Dr. David Lewis, Technology Expert and co-founder of the TREC Legal Track; Ralph Losey, Partner at Jackson Lewis; Matt Nelson, eDiscovery Counsel at Symantec; Lisa Rosen, President of Rosen Technology ResourcesJeff Sharer, Partner at Sidley Austin; and Tomas Thompson, Senior Associate at DLA Piper.

The eDiscovery 2.0 blog has extensively covered the three recent predictive coding cases currently being litigated, and while real court cases are paramount to the direction of predictive coding, the 7th Circuit program will proactively address a scenario that has not yet been considered by a court.  In Da Silva Moore, the parties agreed to the use of predictive coding, but couldn’t subsequently agree on the protocol.  In Kleen, plaintiffs want defendants to redo their review process using predictive coding even though the production is 99% complete.  And, in Global Aerospace the defendant proactively petitioned to use predictive coding over plaintiff’s objections.  By contrast, in the 7th Circuit’s hypothetical, the mock argument predicts another likely predictive coding scenario; the instance where a defendant has a deployed in-house solution in place and argues against the use of predictive coding before discovery has begun.

Traditionally, courts have been reticent to bless or admonish technology, but rather rule on the reasonableness of an organization’s process and depend on expert testimony for issues beyond that scope.  It is expected that predictive coding will follow suit; however, because so little is understood about how the technology works, interest has been generated in a way the legal technology industry has not seen before, as evidenced by this tactical program.

* * *

The hypothetical dispute is a complex litigation matter pending in a U.S. District Court involving a large public corporation that has been sued by a smaller high-tech competitor for alleged anticompetitive conduct, unfair competition and various business torts.  The plaintiff has filed discovery requests that include documents and communications maintained by the defendant corporation’s vast international sales force.  To expedite discovery and level the playing field in terms of resources and costs, the Plaintiff has requested the use of predictive coding to identify and produce responsive documents.  The defendant, wary of the latest (and untested) eDiscovery technology trends, argues that the organization already has a comprehensive eDiscovery program in place.  The defendant will further argue that the technological investment and defensible processes in-house are more than sufficient for comprehensive discovery, and in fact, were designed in order to implement a repeatable and defensible discovery program.  The methodology of the defendant is estimated to take months and result in the typical massive production set, whereas predictive coding would allegedly make for a shorter discovery period.  Because of the burden, the defendant plans to shift some of these costs to the plaintiff.

Ralph Losey’s role will be as the Magistrate Judge, defense counsel will be Martin T. Tully (partner Katten Muchin Rosenman LLP), with Karl Schieneman (of Review Less/ESI Bytes) as the litigation support manager for the corporation and plaintiff’s counsel will be Sean Byrne (eDiscovery solutions director at Axiom) with Herb Roitblat (of OrcaTec) as plaintiff’s eDiscovery consultant.

As the hottest topic in the eDiscovery world, the promises of predictive coding include: increased search accuracy for relevant documents, decreased cost and time spent for manual review, and possibly greater insight into an organization’s corpus of data allowing for more strategic decision making with regard to early case assessment.  The practical implications of predictive coding use are still to be determined and programs like this one will flesh out some of those issues before they get to the courts, which is good for practitioners and judges alike.  Stay tuned for an analysis of the arguments, as well as a link to the video.

Courts Increasingly Cognizant of eDiscovery Burdens, Reject “Gotcha” Sanctions Demands

Friday, May 18th, 2012

Courts are becoming increasingly cognizant of the eDiscovery burdens that the information explosion has placed on organizations. Indeed, the cases from 2012 are piling up in which courts have rejected demands that sanctions be imposed for seemingly reasonable information retention practices. The recent case of Grabenstein v. Arrow Electronics (D. Colo. April 23, 2012) is another notable instance of this trend.

In Grabenstein, the court refused to sanction a company for eliminating emails pursuant to a good faith document retention policy. The plaintiff had argued that drastic sanctions (evidence, adverse inference and monetary) should be imposed on the company since relevant emails regarding her alleged disability were not retained in violation of both its eDiscovery duties and an EEOC regulatory retention obligation. The court disagreed, finding that sanctions were inappropriate because the emails were not deleted before the duty to preserve was triggered: “Plaintiff has not provided any evidence that Defendant deleted e-mails after the litigation hold was imposed.”

Furthermore, the court declined to issue sanctions of any kind even though it found that the company deleted emails in violation of its EEOC regulatory retention duty. The court adopted this seemingly incongruous position because the emails were overwritten pursuant to a reasonable document retention policy:

“there is no evidence to show that the e-mails were destroyed in other than the normal course of business pursuant to Defendant’s e-mail retention policy or that Defendant intended to withhold unfavorable information from Plaintiff.”

The Grabenstein case reinforces the principle that reasonable information retention and eDiscovery processes can and often do trump sanctions requests. Just like the defendant in Grabenstein, organizations should develop and follow a retention policy that eliminates data stockpiles before litigation is reasonably anticipated. Grabenstein also demonstrates the value of deploying a timely and comprehensive litigation hold process to ensure that relevant electronically stored information (ESI) is retained once a preservation duty is triggered. These principles are consistent with various other recent cases, including a decision last month in which pharmaceutical giant Pfizer defeated a sanctions motion by relying on its “good faith business procedures” to eliminate legacy materials before a duty to preserve arose.

The Grabenstein holding also spotlights the role that proportionality can play in determining the extent of a party’s preservation duties. The Grabenstein court reasoned that sanctions would be inappropriate since plaintiff managed to obtain the destroyed emails from an alternative source. Without expressly mentioning “proportionality,” the court implicitly drew on Federal Rule of Civil Procedure 26(b)(2)(C) to reach its “no harm, no foul” approach to plaintiff’s sanctions request. Rule 2626(b)(2)(C)(i) empowers a court to limit discovery when it is “unreasonably cumulative or duplicative, or can be obtained from some other source that is more convenient, less burdensome, or less expensive.” Given that plaintiff actually had the emails in question and there was no evidence suggesting other ESI had been destroyed, proportionality standards tipped the scales against the sanctions request.

The Grabenstein holding is good news for organizations looking to reduce their eDiscovery costs and burdens. By refusing to accede to a tenuous sanctions motion and by following principles of proportionality, the court sustained reasonableness over “gotcha” eDiscovery tactics. If courts adhere to the Grabenstein mantra that preservation and production should be reasonable and proportional, organizations truly stand a better chance of seeing their litigation costs and burdens reduced accordingly.

Will Predictive Coding Live Up to the eDiscovery Hype?

Monday, May 14th, 2012

The myriad of published material regarding predictive coding technology has almost universally promised reduced costs and lighter burdens for the eDiscovery world. Indeed, until the now famous order was issued in the Da Silva Moore v. Publicis Groupe case “approving” the use of predictive coding, many in the industry had parroted this “lower costs/lighter burdens” mantra like the retired athletes who chanted “tastes great/less filling” during the 1970s Miller Lite commercials. But a funny thing happened on the way to predictive coding satisfying the cost cutting mandate of Federal Rule of Civil Procedure 1: the same old eDiscovery story of high costs and lengthy delays are plaguing the initial outlay of this technology. The three publicized cases involving predictive coding are particularly instructive on this early, but troubling development.

Predictive Coding Cases

In Moore v. Publicis Groupe, the plaintiffs’ attempt to recuse Judge Peck has diverted the spotlight from the costs and delays associated with use of predictive coding. Indeed, the parties have been wrangling for months over the parameters of using this technology for defendant MSL’s document review. During that time, each side has incurred substantial attorney fees and other costs to address fairly routine review issues. This tardiness figures to continue as the parties now project that MSL’s production will not be complete until September 7, 2012. Even that date seems too sanguine, particularly given Judge Peck’s recent observation about the slow pace of production: “You’re now woefully behind schedule already at the first wave.” Moreover, Judge Peck has suggested on multiple occasions that a special master be appointed to address disagreements over relevance designations. Special masters, production delays, additional briefings and related court hearings all lead to the inescapable conclusion that the parties will be saddled with a huge eDiscovery bill (despite presumptively lower review costs) due to of the use of predictive coding technology.

The Kleen Products v. Packing Corporation case is also plagued by cost and delay issues. As explained in our post on this case last month, the plaintiffs are demanding a “do-over” of the defendants’ document production, insisting that predictive coding technology be used instead of keyword search and other analytical tools. Setting aside plaintiffs’ arguments, the costs the parties have incurred in connection with this motion are quickly mounting. After submitting briefings on the issues, the court has now held two hearings on the matter, including a full day of testimony from the parties’ experts. With another “Discovery Hearing” now on the docket for May 22nd, predictive coding has essentially turned an otherwise routine document production query into an expensive, time consuming sideshow with no end in sight.

Cost and delay issues may very well trouble the parties in the Global Aerospace v. Landow Aviation matter, too. In Global Aerospace, the court acceded to the defendants’ request to use predictive coding technology over the plaintiffs’ objections. Despite allowing the use of such technology, the court provided plaintiffs with the opportunity to challenge the “completeness or the contents of the production or the ongoing use of predictive coding technology.” Such a condition essentially invites plaintiffs to re-litigate their objections through motion practice. Moreover, like the proverbial “exception that swallows the rule,” the order allows for the possibility that the court could withdraw its approval of predictive coding technology. All of which could lead to seemingly endless discovery motions, production “re-dos” and inevitable cost and delay issues.

Better Times Ahead?

At present, the Da Silva Moore, Kleen Products and Global Aerospace cases do not suggest that predictive coding technology will “secure the just, speedy, and inexpensive determination of every action and proceeding.” Nevertheless, there is room for considerable optimism that predictive coding will ultimately succeed. Technological advances in the industry will provide greater transparency into the black box of predictive coding technology that to date has not existed. Additional advances should also lead to easy-to-use workflow management consoles, which will in turn increase defensibility of the process and satisfy legitimate concerns regarding production results, such as those raised by the plaintiffs in Moore and Global Aerospace.

Technological advances that also increase the accuracy of first generation predictive coding tools should yield greater understanding and acceptance about the role predictive coding can play in eDiscovery. As lawyers learn to trust the reliability of transparent predictive coding, they will appreciate how this tool can be deployed in various scenarios (e.g., prioritization, quality assurance for linear review, full scale production) and in connection with existing eDiscovery technologies. In addition, such understanding will likely facilitate greater cooperation among counsel, a lynchpin for expediting the eDiscovery process. This is evident from the Moore, Kleen Products and Global Aerospace cases, where a lack of cooperation has caused increased costs and delays.

With the promise of transparency and simpler workflows, predictive coding technology should eventually live up to its billing of helping organizations discover their information in an efficient, cost effective and defensible manner.  As for now, the “promise” of first generation predictive coding tools appears to be nothing more than that, leaving organizations looking like the cash-strapped “Monopoly man,” wondering where there litigation dollars have gone.