24h-payday

Posts Tagged ‘analysis’

Defensible Deletion: The Cornerstone of Intelligent Information Governance

Tuesday, October 16th, 2012

The struggle to stay above the rising tide of information is a constant battle for organizations. Not only are the costs and logistics associated with data storage more troubling than ever, but so are the potential legal consequences. Indeed, the news headlines are constantly filled with horror stories of jury verdicts, court judgments and unreasonable settlements involving organizations that failed to effectively address their data stockpiles.

While there are no quick or easy solutions to these problems, an ever increasing method for effectively dealing with these issues is through an organizational strategy referred to as defensible deletion. A defensible deletion strategy could refer to many items. But at its core, defensible deletion is a comprehensive approach that companies implement to reduce the storage costs and legal risks associated with the retention of electronically stored information (ESI). Organizations that have done so have been successful in avoiding court sanctions while at the same time eliminating ESI that has little or no business value.

The first step to implementing a defensible deletion strategy is for organizations to ensure that they have a top-down plan for addressing data retention. This typically requires that their information governance principals – legal and IT – are cooperating with each other. These departments must also work jointly with records managers and business units to decide what data must be kept and for what length of time. All such stakeholders in information retention must be engaged and collaborate if the organization is to create a workable defensible deletion strategy.

Cooperation between legal and IT naturally leads the organization to establish records retention policies, which carry out the key players’ decisions on data preservation. Such policies should address the particular needs of an organization while balancing them against litigation requirements. Not only will that enable a company to reduce its costs by decreasing data proliferation, it will minimize a company’s litigation risks by allowing it to limit the amount of potentially relevant information available for current and follow-on litigation.

In like manner, legal should work with IT to develop a process for how the organization will address document preservation during litigation. This will likely involve the designation of officials who are responsible for issuing a timely and comprehensive litigation hold to custodians and data sources. This will ultimately help an organization avoid the mistakes that often plague document management during litigation.

The Role of Technology in Defensible Deletion

In the digital age, an essential aspect of a defensible deletion strategy is technology. Indeed, without innovations such as archiving software and automated legal hold acknowledgements, it will be difficult for an organization to achieve its defensible deletion objectives.

On the information management side of defensible deletion, archiving software can help enforce organization retention policies and thereby reduce data volume and related storage costs. This can be accomplished with classification tools, which intelligently analyze and tag data content as it is ingested into the archive. By so doing, organizations may retain information that is significant or that otherwise must be kept for business, legal or regulatory purposes – and nothing else.

An archiving solution can also reduce costs through efficient data storage. By expiring data in accordance with organization retention policies and by using single instance storage to eliminate ESI duplicates, archiving software frees up space on company servers for the retention of other materials and ultimately leads to decreased storage costs. Moreover, it also lessens litigation risks as it removes data available for future litigation.

On the eDiscovery side of defensible deletion, an eDiscovery platform with the latest in legal hold technology is often essential for enabling a workable litigation hold process. Effective platforms enable automated legal hold acknowledgements on various custodians across multiple cases. This allows organizations to confidently place data on hold through a single user action and eliminates concerns that ESI may slip through the proverbial cracks of manual hold practices.

Organizations are experiencing every day the costly mistakes of delaying implementation of a defensible deletion program. This trend can be reversed through a common sense defensible deletion strategy which, when powered by effective, enabling technologies, can help organizations decrease the costs and risks associated with the information explosion.

Responsible Data Citizens Embrace Old World Archiving With New Data Sources

Monday, October 8th, 2012

The times are changing rapidly as data explosion mushrooms, but the more things change the more they stay the same. In the archiving and eDiscovery world, organizations are increasingly pushing content from multiple data sources into information archives. Email was the first data source to take the plunge into the archive, but other data sources are following quickly as we increase the amount of data we create (volume) along with the types of data sources (variety). While email is still a paramount data source for litigation, internal/external investigations and compliance – other data sources, namely social media and SharePoint, are quickly catching up.  

This transformation is happening for multiple reasons. The main reason for this expansive push of different data varieties into the archive is because centralizing an organization’s data is paramount to healthy information governance. For organizations that have deployed archiving and eDiscovery technologies, the ability to archive multiple data sources is the Shangri-La they have been looking for to increase efficiency, as well as create a more holistic and defensible workflow.

Organizations can now deploy document retention policies across multiple content types within one archive and can identify, preserve and collect from the same, singular repository. No longer do separate retention policies need to apply to data that originated in different repositories. The increased ability to archive more data sources into a centralized archive provides for unparalleled storage, deduplication, document retention, defensible deletion and discovery benefits in an increasingly complex data environment.

Prior to this capability, SharePoint was another data source in the wild that needed disparate treatment. This meant that legal hold in-place, as well as insight into the corpus of data, was not as clear as it was for email. This lack of transparency within the organization’s data environment for early case assessment led to unnecessary outsourcing, over collection and disparate time consuming workflows. All of the aforementioned detractors cost organizations money, resources and time that can be better utilized elsewhere.

Bringing data sources like SharePoint into an information archive increases the ability for an organization to comply with necessary document retention schedules, legal hold requirements, and the ability to reap the benefits of a comprehensive information governance program. If SharePoint is where an organization’s employees are storing documents that are valuable to the business, order needs to be brought to the repository.

Additionally, many projects are abandoned and left to die on the vine in SharePoint. These projects need to be expired and that capacity must be recycled for a higher business purpose. Archives currently enable document libraries, wikis, discussion boards, custom lists, “My Sites” and SharePoint social content for increased storage optimization, retention/expiration of content and eDiscovery. As a result, organizations can better manage complex projects such as migrations, versioning, site consolidations and expiration with SharePoint archiving.  

Data can be analogized to a currency, where the archive is the bank. In treating data as a currency, organizations must ask themselves: why are companies valued the way they are on Wall Street? For companies that perform service or services in combination with products, they are valued many times on customer lists, data to be repurposed about consumers (Facebook), and various other databases. A recent Forbes article discusses people, value and brand as predominant indicators of value.

While these valuation metrics are sound, the valuation stops short of measuring the quality of the actual data within an organization, examining if it is organized and protected. The valuation also does not consider the risks of and benefits of how the data is stored, protected and whether or not it is searchable. The value of the data inside a company is what supports all three of the aforementioned valuations without exception. Without managing the data in an organization, not only are eDiscovery and storage costs a legal and financial risk, the aforementioned three are compromised.

If employee data is not managed/monitored appropriately, if the brand is compromised due to lack of social media monitoring/response, or if litigation ensues without the proper information governance plan, then value is lost because value has not been assessed and managed. Ultimately, an organization is only as good as its data, and this means there’s a new asset on Wall Street – data.

It’s not a new concept to archive email,  and in turn it isn’t novel that data is an asset. It has just been a less understood asset because even though massive amounts of data are created each day in organizations, storage has become cheap. SharePoint is becoming more archivable because more critical data is being stored there, including business records, contracts and social media content. Organizations cannot fear what they cannot see until they are forced by an event to go back and collect, analyze and review that data. Costs associated with this reactive eDiscovery process can range from $3,000-30,000 a gigabyte, compared to the 20 cents per gigabyte for storage. The downstream eDiscovery costs are obviously costly, especially as organizations begin to deal in terabytes and zettabytes. 

Hence, plus ca change, plus c’est le meme chose and we will see this trend continue as organizations push more valuable data into the archive and expire data that has no value. Multiple data sources have been collection sources for some time, but the ease of pulling everything into an archive is allowing for economies of scale and increased defensibility regarding data management. This will decrease the risks associated with litigation and compliance, as well as boost the value of companies.

The Malkovich-ization of Predictive Coding in eDiscovery

Tuesday, August 14th, 2012

In the 1999 Academy Award-winning movie, Being John Malkovich, there’s a scene where the eponymous character is transported into his own body via a portal and everyone around him looks exactly like him.  All the characters can say is “Malkovich” as if this single word conveys everything to everyone.

In the eDiscovery world it seems lately like predictive coding has been Malkovich-ized, in the sense that it’s the start and end of every discussion. We here at eDiscovery 2.0 are similarly unable to break free of predictive coding’s gravitational pull – but we’ve attempted to give the use of this emerging technology some context, in the form of a top ten list.

So, without further ado, here are the top ten important items to consider with predictive coding and eDiscovery generally…

1. Perfection Is Not Required in eDiscovery

While not addressing predictive coding per se, it’s important to understand the litmus test for eDiscovery efforts. Regardless of the tools or techniques utilized to respond to document requests in electronic discovery, perfection is not required. The goal should be to create a reasonable and repeatable process to establish defensibility in the event you face challenges by the court or an opposing party. Make sure the predictive coding application (and broader eDiscovery platform you choose) functions correctly, is used properly and can generate reports illustrating that a reasonable process was followed. Remember, making smart decisions to establish a repeatable and defensible process early will inevitably reduce the risk of downstream problems.

2. Predictive Coding Is Just One Tool in the Litigator’s Tool-belt

Although the right predictive coding tools can reduce the time and cost of document review and improve accuracy rates, they are not a substitute for other important technology tools. Keyword search, concept search, domain filtering, and discussion threading are only a few of the other important tools in the litigator’s tool-belt that can and should be used together with predictive coding. Invest in an eDiscovery platform that contains a wide range of seamlessly integrated eDiscovery tools that work together to ensure the simplest, most flexible, and most efficient eDiscovery process.

3. Using Predictive Coding Tools Properly Makes All the Difference

Electronic discovery applications, like most technology solutions, are only effective if deployed properly. Since many early-generation tools are not intuitive, learning how to use a given predictive coding tool properly is critical to eDiscovery success. To maximize chances for success and minimize the risk of problems, select trustworthy predictive coding applications supported by reputable providers and make sure to learn how to use the solutions properly.

4. Predictive Coding Isn’t Just for Big Cases

Sometimes predictive coding applications must be purchased separately from other eDiscovery tools; other times additional fees may be required to use predictive coding. As a result, many practitioners only consider predictive coding for the largest cases, to ensure the cost of eDiscovery doesn’t exceed the value of the case. If possible, invest in an electronic discovery solution that includes predictive coding as part of an integrated eDiscovery platform containing legal hold, collection, processing, culling, analysis, and review capabilities at no additional charge. Since the cost of using different predictive coding tools varies dramatically, make sure to select a tool at the right price point to maximize economic efficiencies across multiple cases, regardless of size.

5. Investigate the Solution Providers

All predictive coding applications are not created equal. The tools vary significantly in price, usability, performance and overall reputation. Although the availability of trustworthy and independent information comparing different predictive coding solutions is limited, information about the companies creating these different application is available. Make sure to review independent research from analysts such as Gartner, Inc., as part of the vetting process instead of starting from scratch.

6. Test Drive Before You Buy

Savvy eDiscovery technologists take steps to ensure that the predictive coding application they are considering works within their organization’s environment and on their organization’s data. Product demonstrations are important, but testing products internally through a proof of concept evaluation is even more important if you are contemplating bringing an eDiscovery platform in house. Additionally, check company references before investing in a solution to find out how others feel about the software they purchased and the level of product support they receive.

7. Defensibility Is Paramount

Although predictive coding tools can save organizations money through increased efficiency, the relative newness and complexity of the technology can create risk. To avoid this risk, choose a predictive coding tool that is easy to use, developed by an industry leading company and fully supported.

8. Statistical Methodology and Product Training Are Critical

The underlying statistical methodology behind any predictive coding application is critical to the defensibility of the entire eDiscovery process. Many providers fail to incorporate a product workflow for selecting a properly sized control set in certain situations. Unfortunately, this oversight could unwittingly result in misrepresentations to the court and opposing parties about the system’s performance. Select providers capable of illustrating the statistical methodology behind their approach and that are capable of providing proper training on the use of their system.

9. Transparency Is Key

Many practitioners are legitimately concerned that early-generation predictive coding solutions operate as a “black box,” meaning the way they work is difficult to understand and/or explain. Since it is hard to defend technology that is difficult to understand, selecting a solution and process that can be explained in court is critical. Make sure to choose a predictive coding solution that is transparent to avoid allegations by opponents that your tool is ”black box” technology that cannot be trusted.

10. Align with Attorneys You Trust

The fact that predictive coding is relatively new to the legal field and can be more complex than traditional approaches to eDiscovery highlights the importance of aligning with trusted legal counsel. Most attorneys defer legal technology decisions to others on their legal team and have little practical experience using these solutions themselves. Conversational knowledge about these tools isn’t enough given the confusion, complexity, and risk related to selecting the wrong tool or using the applications improperly. Make sure to align with an attorney who possesses hands-on experience and who are able to articulate specific reasons why they prefer a particular solution or approach.

Hopefully this top ten list can ensure that your use of “predictive coding” isn’t Malkovich-ized – meaning you understand when, how and why you’re deploying this particularly eDiscovery technology. Without the right context, the eDiscovery industry risks overusing this term and in turn over-hyping this exciting next chapter in process improvement.

Why Half Measures Aren’t Enough in Predictive Coding

Thursday, July 26th, 2012

In part 2 of our predictive coding blog series, we highlighted some of the challenges in measuring and communicating the accuracy of computer predictions. But what exactly do we mean when we refer to accuracy? In this post, I will cover the various metrics used to assess the accuracy of predictive coding.

The most intuitive method for measuring the accuracy of predictions is to simply calculate the percentage of documents the software predicted correctly.  If 80 out of 100 documents are correctly predicted, the accuracy should be 80%. This approach is one of the standard methods used in many other disciplines. For example, a test score in school is often calculated by taking the number of questions answered correctly, dividing that by the total number of questions on the test, then multiplying the resulting number by 100 to get a percentage value. Wouldn’t it make sense to apply the same method for measuring the accuracy of predictive coding? Surprisingly, the answer is actually, “no.”

This approach is problematic because in eDiscovery the goal is not to determine the number of all documents tagged correctly, but rather the number of responsive documents tagged correctly. Let’s assume there are 50,000 documents in a case and each document has been reviewed by a human and computer, resulting in the human-computer comparison chart shown below.

 

Based on this chart, we can see that out of 50,000 total documents, the software predicted 42,000 documents (sum of row #1 and #3) correctly and therefore its accuracy is 84% (42,000/50,000).

However, analyzing the chart closely reveals a very different picture. The results of human review shows that there are 8,000 total responsive documents (sum of row #1 and #2) but the software found only 2,000 of those (row #1). This means the computer only found 25% of the truly responsive documents. This is called Recall.

Also, of the 4,000 documents that the computer predicted as responsive (the sum of row #1 and #4), only 2,000 are actually responsive (row #1), meaning the computer is right only 50% of the time when it predicts a document to be responsive. This is called Precision.

So, why are Recall and Precision so low – only 25% and 50%, respectively – when computer predictions are correct for 84% of the documents? That’s because the software did very well predicting non-responsive documents.  Based on the human review, there are 42,000 non-responsive documents (sum of row #3 and #4), of which the software correctly found 40,000, meaning the computer is right 95% (40,000/42,000) of the time when it predicts a document non-responsive. While the software is right only 50% of the time when predicting a document responsive, it is right 95% of the time when predicting a document non-responsive, meaning that overall  predictions across all documents are right to 84%.

In eDiscovery, parties are required to take reasonable steps to find documents.  The example above illustrates that the “percentage of correct predictions across all documents” metric may paint an inaccurate view of the number of responsive documents found or missed by the software. This is especially true when most of the documents in a case are non-responsive, which is the most common scenario in eDiscovery. Therefore, Recall and Precision, which accurately track the number of responsive documents found and missed, are better metrics for measuring accuracy of predictions, since they measure what the eDiscovery process is seeking to achieve.

However, measuring and tracking both metrics independently could be cumbersome in many situations, especially if the end goal is to achieve higher accuracy on both measures overall.  A single metric called F-measure, which tracks both Precision and Recall and is designed to strike a balance (or harmonic mean) between the two, can be used instead. A higher F-measure typically indicates higher precision and recall, and a lower F-measure typically indicates lower precision and recall.

These three units – Precision, Recall and F-measure – are the most widely accepted standards for measuring the accuracy of computer predictions. As a result, users of predictive coding are looking to solutions that provide a way to measure the prediction accuracy in all three units. The most advanced solutions have built-in measurement workflows and tracking mechanisms.

There is no standard for Recall, Precision or F-measure percentage. It is up to the parties involved in eDiscovery to determine a “reasonable” percentage based on the time, cost and risk trade-offs. The higher percentage means higher accuracy – but it also means higher eDiscovery costs as the software will likely require more training. For high-risk matters, 80%, 90% or even higher Recall may be required, but for lower-risk matters, 70% or even 60% may be acceptable. It should be noted that academic studies analyzing the effectiveness of linear review show widely varying review quality. One study which compared the accuracy of manual review with technology assisted review shows that manual review achieved, on average, 59.3% recall compared with an average recall of 76.7% for technology assisted review such as predictive coding.

#InfoGov Twitter Chat Hones in on Starting Places and Best Practices

Tuesday, July 3rd, 2012

Unless you’re an octogenarian living in rural Uzbekistan[i] you’ve likely seen the meteoric rise of social media over the last decade. Even beyond hyper-texting teens, businesses too are taking advantage of this relatively new form function to engage with their more technically savvy customers. Recently, Symantec held its first “Twitter Chat” on the topic of information governance (fondly referred to on Twitter as #InfoGov). For those not familiar with the concept, a Twitter Chat is a virtual discussion held on Twitter using a specific hashtag – in this case #IGChat. At a set date and time, parties interested in the topic log into Twitter and start participating in the fireworks on the designated hashtag.

“Fireworks” may be a bit overstated, but given that the moderators (eDiscovery Counsel at Symantec) and participants were limited to 140 characters, the “conversation” was certainly frenetic. Despite the fast pace, one benefit of a Twitter Chat is that you can communicate with shortened web links, as a way to share and discuss content beyond the severely limited word count. During this somewhat staccato discussion, we found the conversation to take some interesting twists and turns, which I thought I’d excerpt (and expound upon[ii]) in this blog.

Whether in a Twitter Chat or otherwise, once the discussion of information governance begins everyone wants to know where to start. The #IGChat was no different.

  • Where to begin?  While there wasn’t consensus per se on a good starting place, one cogent remark out of the blocks was: “The best way to start is to come up with an agreed upon definition — Gartner’s is here t.co/HtGTWN2g.” While the Gartner definition is a good starting place, there are others out there that are more concise. The eDiscovery Journal Group has a good one as well:  “Information Governance is a comprehensive program of controls, processes, and technologies designed to help organizations maximize the value of information assets while minimizing associated risks and costs.”  Regardless of the precise definition, it’s definitely worth the cycles to rally around a set construct that works for your organization.
  • Who’s on board?  The next topic centered around trying to find the right folks organizationally to participate in the information governance initiative. InfoGovlawyer chimed in: “Seems to me like key #infogov players should include IT, Compliance, Legal, Security reps.” Then, PhilipFavro suggested that the “[r]ight team would likely include IT, legal, records managers, pertinent business units and compliance.” Similar to the previous question, at this stage in the information governance maturation process, there isn’t a single, right answer. More importantly, the team needs to have stakeholders from at least Legal and IT, while bringing in participants from other affected constituencies (Infosec, Records, Risk, Compliance, etc.) – basically, anyone interested in maximizing the value of information while reducing the associated risks.
  • Where’s the ROI?  McManusNYLJ queried: “Do you think #eDiscovery, #archiving and compliance-related technology provide ample ROI? Why or why not?”  Here, the comments came in fast and furious. One participant pointed out that case law can be helpful in showing the risk reduction:  “Great case showing the value of an upstream archive – Danny Lynn t.co/dcReu4Qg.” AlliWalt chimed in: “Yes, one event can set your company back millions…just look at the Dupont v. Kolon case… ROI is very real.” Another noted that “Orgs that take a proactive approach to #eDiscovery requests report a 64% faster response time, 2.3x higher success rate.” And, “these same orgs were 78% less likely to be sanctioned and 47% less likely to be legally compromised t.co/5dLRUyq6.” ROI for information governance seemed to be a nut that can be cracked any number of ways, ranging from risk reduction (via sanctions and adverse legal decisions) to better preparation. Here too, an organization’s particular sensitivities should come into play since all entities won’t have the same concerns about risk reduction, for example.
  • Getting Granular. Pegduncan, an active subject matter expert on the topic, noted that showing ROI was the right idea, but not always easy to demonstrate: “But you have to get their attention. Hard to do when IT is facing funding challenges.” This is when granular eDiscovery costs were mentioned: “EDD costs $3 -18k per gig (Rand survey) and should wake up most – adds up w/ large orgs having 147 matters at once.” Peg wasn’t that easily convinced: “Agreed that EDD costs are part of biz case, but .. it’s the problem of discretionary vs non-discretionary spending.”
  • Tools Play a Role. One participant asked: “what about tools for e-mail thread analysis, de-duplication, near de-duplication – are these applicable to #infogov?” A participant noted that “in the future we will see tools like #DLP and #predictivecoding used for #infogov auto-classification – more on DLP here: t.co/ktDl5ULe.” Pegduncan chimed in that “DLP=Data Loss Prevention. Link to Clearwell’s post on Auto-Classification & DLP t.co/ITMByhbj.”

With a concept as broad and complex as information governance, it’s truly amazing that a cogent “conversation” can take place in a series of 140 character tweets. As the Twitter Chat demonstrates, the information governance concept continues to evolve and is doing so through discussions like this one via a social media platform. As with many of the key information governance themes (Ownership, ROI, Definition, etc.) there isn’t a right answer at this stage, but that isn’t an excuse for not asking the critical questions. “Sooner started, sooner finished” is a motto that will serve many organizations well in these exciting times. And, for folks who say they can’t spare the time, they’d be amazed what they can learn in 140 characters.

Mark your calendars and track your Twitter hashtags now: The next #IGChat will be held on July 26 @ 10am PT.



[i] I’ve never been to rural Uzbekistan, but it just sounded remote.  So, my apologies if there’s a world class internet infrastructure there where the denizens tweet prolifically. Given that’s it’s one (of two) double landlocked countries in the world it seemed like an easy target. Uzbeks please feel free to use the comment field and set me straight.

[ii] Minor edits were made to select tweets, but generally the shortened Twitter grammar wasn’t changed.

Kleen Products Predictive Coding Update – Judge Nolan: “I am a believer of principle 6 of Sedona”

Tuesday, June 5th, 2012

Recent transcripts reveal that 7th Circuit Magistrate Judge Nan Nolan has urged the parties in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. to focus on developing a mutually agreeable keyword search strategy for eDiscovery instead of debating whether other search and review methodologies would yield better results. This is big news for litigators and others in the electronic discovery space because many perceived Kleen Products as potentially putting keyword search technology on trial, compared to newer technology like predictive coding. Considering keyword search technology is still widely used in eDiscovery, a ruling by Judge Nolan requiring defendants to redo part of their production using technology other than keyword searches would sound alarm bells for many litigators.

The controversy surrounding Kleen Products relates both to Plaintiffs’ position, as well as the status of discovery in the case. Plaintiffs initially asked Judge Nolan to order Defendants to redo their previous productions and all future productions using alternative technology.  The request was surprising to many observers because some Defendants had already spent thousands of hours reviewing and producing in excess of one million documents. That number has since surpassed three million documents.  Among other things, Plaintiffs claim that if Defendants had used “Content Based Advanced Analytics” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. Notably, Plaintiffs do not appear to point to any instances of specific documents missing from Defendants’ productions.

In response, Defendants countered that their use of keyword search technology and their eDiscovery methodology in general was extremely rigorous and thorough. More specifically, they highlight their use of advanced culling and analysis tools (such as domain filtering and email threading) in addition to keyword search tools.  Plaintiffs also claim they cooperated with Defendants by allowing them to participate in the selection of keywords used to search for relevant documents.  Perhaps going above and beyond the eDiscovery norm, the Defendants even instituted a detailed document sampling approach designed to measure the quality of their document productions.

Following two full days of expert witness testimony regarding the adequacy of Plaintiffs’ initial productions, Judge Nolan finally asked the parties to try and reach compromise on the “Boolean” keyword approach.  She apparently reasoned that having the parties work out a mutually agreeable approach based on what Defendants had already implemented was preferable to scheduling yet another full day of expert testimony — even though additional expert testimony is still an option.

In a nod to the Sedona Principles, she further explained her rationale on March 28, 2012, at the conclusion of the second day of testimony:

“the defendants had done a lot of work, the defendant under Sedona 6 has the right to pick the [eDiscovery] method. Now, we all know, every court in the country has used Boolean search, I mean, this is not like some freak thing that they [Defendants] picked out…”

Judge Nolan’s reliance on the Sedona Best Practices Recommendations & Principles for Addressing Electronic Document Production reveals how she would likely rule if Plaintiffs renew their position that Defendants should have used predictive coding or some other kind of technology in lieu of keyword searches. Sedona Principle 6 states that:

“[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

In other words, Judge Nolan confirmed that in her court, opposing parties typically may not dictate what technology solutions their opponents must use without some indication that the technology or process used failed to yield accurate results. Judge Nolan also observed that quality and accuracy are key guideposts regardless of the technology utilized during the eDiscovery process:

“what I was learning from the two days, and this is something no other court in the country has really done too, is how important it is to have quality search. I mean, if we want to use the term “quality” or “accurate,” but we all want this…– how do you verify the work that you have done already, is the way I put it.”

Although Plaintiffs have reserved their right to reintroduce their technology arguments, recent transcripts suggest that Defendants will not be required to use different technology. Plaintiffs continue to meet and confer with individual Defendants to agree on keyword searches, as well as the types of data sources that must be included in the collection. The parties and Judge also appear to agree that they would like to continue making progress with 30(b)(6) depositions and other eDiscovery issues before Judge Nolan retires in a few months, rather than begin a third day of expert hearings regarding technology related issues. This appears to be good news for the Judge and the parties since the eDiscovery issues now seem to be headed in the right direction as a result of mutual cooperation between the parties and some nudging by Judge Nolan.

There is also good news for outside observers in that Judge Nolan has provided some sage guidance to help future litigants before she steps down from the bench. For example, it is clear that Judge Nolan and other judges continue to emphasize the importance of cooperation in today’s complex new world of technology. Parties should be prepared to cooperate and be more transparent during discovery given the judiciary’s increased reliance on the Sedona Cooperation Proclamation. Second, Kleen Products illustrates that keyword search is not dead. Instead, keyword search should be viewed as one of many tools in the Litigator’s Toolbelt™ that can be used with other tools such as email threading, advanced filtering technology, and even predictive coding tools.  Finally, litigators should take note that regardless of the tools they select, they must be prepared to defend their process and use of those tools or risk the scrutiny of judges and opposing parties.

Gartner’s “2012 Magic Quadrant for E-Discovery Software” Provides a Useful Roadmap for Legal Technologists

Tuesday, May 29th, 2012

Gartner has just released its 2012 Magic Quadrant for E-Discovery Software, which is an annual report that analyzes the state of the electronic discovery industry and provides a detailed vendor-by-vendor evaluation. For many, particularly those in IT circles, Gartner is an unwavering north star used to divine software market leaders, in topics ranging from business intelligence platforms to wireless lan infrastructures. When IT professionals are on the cusp of procuring complex software, they look to analysts like Gartner for quantifiable and objective recommendations – as a way to inform and buttress their own internal decision making processes.

But for some in the legal technology field (particularly attorneys), looking to Gartner for software analysis can seem a bit foreign. Legal practitioners are often more comfortable with the “good ole days” when the only navigation aid in the eDiscovery world was provided by the dynamic duo of George Socha and Tom Gelbmanm, who (beyond creating the EDRM) were pioneers of the first eDiscovery rankings survey. Albeit somewhat short lived, their Annual Electronic Discovery[i] Survey ranked the hundreds of eDiscovery providers and bucketed the top tier players in both software and litigation support categories. The scope of their mission was grand, and they were perhaps ultimately undone by the breadth of their task (stopping the Survey in 2010), particularly as the eDiscovery landscape continued to mature, fragment and evolve.

Gartner, which has perfected the analysis of emerging software markets, appears to have taken on this challenge with an admittedly more narrow (and likely more achievable) focus. Gartner published its first Magic Quadrant (MQ) for the eDiscovery industry last year, and in the 2012 Magic Quadrant for E-Discovery Software report they’ve evaluated the top 21 electronic discovery software vendors. As with all Gartner MQs, their methodology is rigorous; in order to be included, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon criteria for completeness of vision and ability to execute.

By eliminating the legion of service providers and law firms, Gartner has made their mission both more achievable and perhaps (to some) less relevant. When talking to certain law firms and litigation support providers, some seem to treat the Gartner initiative (and subsequent Magic Quadrant) like a map from a land they never plan to visit. But, even if they’re not directly procuring eDiscovery software, the Gartner MQ should still be seen by legal technologists as an invaluable tool to navigate the perils of the often confusing and shifting eDiscovery landscape – particularly with the rash of recent M&A activity.

Beyond the quadrant positions[ii], comprehensive analysis and secular market trends, one of the key underpinnings of the Magic Quadrant is that the ultimate position of a given provider is in many ways an aggregate measurement of overall customer satisfaction. Similar in ways to the net promoter concept (which is a tool to gauge the loyalty of a firm’s customer relationships simply by asking how likely that customer is to recommend a product/service to a colleague), the Gartner MQ can be looked at as the sum total of all customer experiences.[iii] As such, this usage/satisfaction feedback is relevant even for parties that aren’t purchasing or deploying electronic discovery software per se. Outside counsel, partners, litigation support vendors and other interested parties may all end up interacting with a deployed eDiscovery solution (particularly when such solutions have expanded their reach as end-to-end information governance platforms) and they should want their chosen solution to used happily and seamlessly in a given enterprise. There’s no shortage of stories about unhappy outside counsel (for example) that complain about being hamstrung by a slow, first generation eDiscovery solution that ultimately makes their job harder (and riskier).

Next, the Gartner MQ also is a good short-handed way to understand more nuanced topics like time to value and total cost of ownership. While of course related to overall satisfaction, the Magic Quadrant does indirectly address the query about whether the software does what it says it will (delivering on the promise) in the time frame that is claimed (delivering the promise in a reasonable time frame) since these elements are typically subsumed in the satisfaction metric. This kind of detail is disclosed in the numerous interviews that Gartner conducts to go behind the scenes, querying usage and overall satisfaction.

While no navigation aid ensures that a traveler won’t get lost, the Gartner Magic Quadrant for E-Discovery Software is a useful map of the electronic discovery software world. And, particularly looking at year-over-year trends, the MQ provides a useful way for legal practitioners (beyond the typical IT users) to get a sense of the electronic discovery market landscape as it evolves and matures. After all, staying on top of the eDiscovery industry has a range of benefits beyond just software procurement.

Please register here to access the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



[i] Note, in the good ole days folks still used two words to describe eDiscovery.

[ii] Gartner has a proprietary matrix that it uses to place the entities into four quadrants: Leaders, Challengers, Visionaries and Niche Players.

[iii] Under the Ability to Execute axis Gartner weighs a number of factors including “Customer Experience: Relationships, products and services or programs that enable clients to succeed with the products evaluated. Specifically, this criterion includes implementation experience, and the ways customers receive technical support or account support. It can also include ancillary tools, the existence and quality of customer support programs, availability of user groups, service-level agreements and so on.”

7th Circuit eDiscovery Pilot Program Tackles Technology Assisted Review With Mock Arguments

Tuesday, May 22nd, 2012

The 7th Circuit eDiscovery Pilot Program’s Mock Argument is the first of its kind and is slated for June 14, 2012.  It is not surprising that the Seventh Circuit’s eDiscovery Pilot Program would be the first to host an event like this on predictive coding, as the program has been a progressive model across the country for eDiscovery protocols since 2009.  The predictive coding event is open to the public (registration required) and showcases the expertise of leading litigators, technologists and experts from all over the United States.  Speakers include: Jason R. Baron, Director of Litigation at the National Archives and Records Administration; Maura R. Grossman, Counsel at Wachtell, Lipton, Rosen & Katz; Dr. David Lewis, Technology Expert and co-founder of the TREC Legal Track; Ralph Losey, Partner at Jackson Lewis; Matt Nelson, eDiscovery Counsel at Symantec; Lisa Rosen, President of Rosen Technology ResourcesJeff Sharer, Partner at Sidley Austin; and Tomas Thompson, Senior Associate at DLA Piper.

The eDiscovery 2.0 blog has extensively covered the three recent predictive coding cases currently being litigated, and while real court cases are paramount to the direction of predictive coding, the 7th Circuit program will proactively address a scenario that has not yet been considered by a court.  In Da Silva Moore, the parties agreed to the use of predictive coding, but couldn’t subsequently agree on the protocol.  In Kleen, plaintiffs want defendants to redo their review process using predictive coding even though the production is 99% complete.  And, in Global Aerospace the defendant proactively petitioned to use predictive coding over plaintiff’s objections.  By contrast, in the 7th Circuit’s hypothetical, the mock argument predicts another likely predictive coding scenario; the instance where a defendant has a deployed in-house solution in place and argues against the use of predictive coding before discovery has begun.

Traditionally, courts have been reticent to bless or admonish technology, but rather rule on the reasonableness of an organization’s process and depend on expert testimony for issues beyond that scope.  It is expected that predictive coding will follow suit; however, because so little is understood about how the technology works, interest has been generated in a way the legal technology industry has not seen before, as evidenced by this tactical program.

* * *

The hypothetical dispute is a complex litigation matter pending in a U.S. District Court involving a large public corporation that has been sued by a smaller high-tech competitor for alleged anticompetitive conduct, unfair competition and various business torts.  The plaintiff has filed discovery requests that include documents and communications maintained by the defendant corporation’s vast international sales force.  To expedite discovery and level the playing field in terms of resources and costs, the Plaintiff has requested the use of predictive coding to identify and produce responsive documents.  The defendant, wary of the latest (and untested) eDiscovery technology trends, argues that the organization already has a comprehensive eDiscovery program in place.  The defendant will further argue that the technological investment and defensible processes in-house are more than sufficient for comprehensive discovery, and in fact, were designed in order to implement a repeatable and defensible discovery program.  The methodology of the defendant is estimated to take months and result in the typical massive production set, whereas predictive coding would allegedly make for a shorter discovery period.  Because of the burden, the defendant plans to shift some of these costs to the plaintiff.

Ralph Losey’s role will be as the Magistrate Judge, defense counsel will be Martin T. Tully (partner Katten Muchin Rosenman LLP), with Karl Schieneman (of Review Less/ESI Bytes) as the litigation support manager for the corporation and plaintiff’s counsel will be Sean Byrne (eDiscovery solutions director at Axiom) with Herb Roitblat (of OrcaTec) as plaintiff’s eDiscovery consultant.

As the hottest topic in the eDiscovery world, the promises of predictive coding include: increased search accuracy for relevant documents, decreased cost and time spent for manual review, and possibly greater insight into an organization’s corpus of data allowing for more strategic decision making with regard to early case assessment.  The practical implications of predictive coding use are still to be determined and programs like this one will flesh out some of those issues before they get to the courts, which is good for practitioners and judges alike.  Stay tuned for an analysis of the arguments, as well as a link to the video.

Will Predictive Coding Live Up to the eDiscovery Hype?

Monday, May 14th, 2012

The myriad of published material regarding predictive coding technology has almost universally promised reduced costs and lighter burdens for the eDiscovery world. Indeed, until the now famous order was issued in the Da Silva Moore v. Publicis Groupe case “approving” the use of predictive coding, many in the industry had parroted this “lower costs/lighter burdens” mantra like the retired athletes who chanted “tastes great/less filling” during the 1970s Miller Lite commercials. But a funny thing happened on the way to predictive coding satisfying the cost cutting mandate of Federal Rule of Civil Procedure 1: the same old eDiscovery story of high costs and lengthy delays are plaguing the initial outlay of this technology. The three publicized cases involving predictive coding are particularly instructive on this early, but troubling development.

Predictive Coding Cases

In Moore v. Publicis Groupe, the plaintiffs’ attempt to recuse Judge Peck has diverted the spotlight from the costs and delays associated with use of predictive coding. Indeed, the parties have been wrangling for months over the parameters of using this technology for defendant MSL’s document review. During that time, each side has incurred substantial attorney fees and other costs to address fairly routine review issues. This tardiness figures to continue as the parties now project that MSL’s production will not be complete until September 7, 2012. Even that date seems too sanguine, particularly given Judge Peck’s recent observation about the slow pace of production: “You’re now woefully behind schedule already at the first wave.” Moreover, Judge Peck has suggested on multiple occasions that a special master be appointed to address disagreements over relevance designations. Special masters, production delays, additional briefings and related court hearings all lead to the inescapable conclusion that the parties will be saddled with a huge eDiscovery bill (despite presumptively lower review costs) due to of the use of predictive coding technology.

The Kleen Products v. Packing Corporation case is also plagued by cost and delay issues. As explained in our post on this case last month, the plaintiffs are demanding a “do-over” of the defendants’ document production, insisting that predictive coding technology be used instead of keyword search and other analytical tools. Setting aside plaintiffs’ arguments, the costs the parties have incurred in connection with this motion are quickly mounting. After submitting briefings on the issues, the court has now held two hearings on the matter, including a full day of testimony from the parties’ experts. With another “Discovery Hearing” now on the docket for May 22nd, predictive coding has essentially turned an otherwise routine document production query into an expensive, time consuming sideshow with no end in sight.

Cost and delay issues may very well trouble the parties in the Global Aerospace v. Landow Aviation matter, too. In Global Aerospace, the court acceded to the defendants’ request to use predictive coding technology over the plaintiffs’ objections. Despite allowing the use of such technology, the court provided plaintiffs with the opportunity to challenge the “completeness or the contents of the production or the ongoing use of predictive coding technology.” Such a condition essentially invites plaintiffs to re-litigate their objections through motion practice. Moreover, like the proverbial “exception that swallows the rule,” the order allows for the possibility that the court could withdraw its approval of predictive coding technology. All of which could lead to seemingly endless discovery motions, production “re-dos” and inevitable cost and delay issues.

Better Times Ahead?

At present, the Da Silva Moore, Kleen Products and Global Aerospace cases do not suggest that predictive coding technology will “secure the just, speedy, and inexpensive determination of every action and proceeding.” Nevertheless, there is room for considerable optimism that predictive coding will ultimately succeed. Technological advances in the industry will provide greater transparency into the black box of predictive coding technology that to date has not existed. Additional advances should also lead to easy-to-use workflow management consoles, which will in turn increase defensibility of the process and satisfy legitimate concerns regarding production results, such as those raised by the plaintiffs in Moore and Global Aerospace.

Technological advances that also increase the accuracy of first generation predictive coding tools should yield greater understanding and acceptance about the role predictive coding can play in eDiscovery. As lawyers learn to trust the reliability of transparent predictive coding, they will appreciate how this tool can be deployed in various scenarios (e.g., prioritization, quality assurance for linear review, full scale production) and in connection with existing eDiscovery technologies. In addition, such understanding will likely facilitate greater cooperation among counsel, a lynchpin for expediting the eDiscovery process. This is evident from the Moore, Kleen Products and Global Aerospace cases, where a lack of cooperation has caused increased costs and delays.

With the promise of transparency and simpler workflows, predictive coding technology should eventually live up to its billing of helping organizations discover their information in an efficient, cost effective and defensible manner.  As for now, the “promise” of first generation predictive coding tools appears to be nothing more than that, leaving organizations looking like the cash-strapped “Monopoly man,” wondering where there litigation dollars have gone.

Look Before You Leap! Avoiding Pitfalls When Moving eDiscovery to the Cloud

Monday, May 7th, 2012

It’s no surprise that the eDiscovery frenzy gripping the American legal system over the past decade has become increasingly expensive.  Particularly costly to organizations is the process of preserving and collecting documents, a fact repeatedly emphasized by the Advisory Committee in its report regarding the 2006 amendments to the Federal Rules of Civil Procedure (FRCP).  These aspects of discovery are often lengthy and can be disruptive to business operations.  Just as troubling, they increase the duration and expense of litigation.

Because these costs and delays affect the courts as well as clients, it comes as no surprise that judges have now heightened their expectation for how organizations store, manage and discover their electronically stored information (ESI).  Gone are the days when enterprises could plead ignorance for not preserving or producing their data in an efficient, cost effective and defensible manner.  Organizations must now follow best practices – both during and before litigation – if they are to safely navigate the stormy seas of eDiscovery.

The importance of deploying such practices applies acutely to those organizations that are exploring “cloud”-based alternatives to traditional methods for preserving and producing electronic information.  Under the right circumstances, the cloud may represent a fantastic opportunity to streamline the eDiscovery process for an organization.  Yet it could also turn into a dangerous liaison if the cloud offering is not properly scrutinized for basic eDiscovery functionality.  Indeed, the City of Los Angeles’s recent decision to partially disengage from its cloud service provider exemplifies this admonition to “look before you leap” to the cloud.  Thus, before selecting a cloud provider for eDiscovery, organizations should be particularly careful to ensure that a provider has the ability both to efficiently retrieve data from the cloud and to issue litigation hold notices.

Effective Data Retrieval Requires Efficient Data Storage

The hype surrounding the cloud has generally focused on the opportunity for cheap and unlimited storage of information.  Storage, however, is only one of many factors to consider in selecting a cloud-based eDiscovery solution.  To be able to meet the heightened expectations of courts and regulatory bodies, organizations must have the actual – not theoretical – ability to retrieve their data in real time.  Otherwise, they may not be able to satisfy eDiscovery requests from courts or regulatory bodies, let alone the day-to-day demands of their operations.

A key step to retrieving company data in a timely manner is to first confirm whether the cloud offering can intelligently organize that information such that organizations can quickly respond to discovery requests and other legal demands.  This includes the capacity to implement and observe company retention protocols.  Just like traditional data archiving software, the cloud must enable automated retention rules and thus limit the retention of information to a designated time period.  This will enable data to be expired once it reaches the end of that period.

The pool of data can be further decreased through single instance storage.  This deduplication technology eliminates redundant data by preserving only a master copy of each document placed into the cloud.  This will reduce the amount of data that needs to be identified, preserved, collected and reviewed as part of any discovery process.  For while unlimited data storage may seem ideal now, reviewing unlimited amounts of data will quickly become a logistical and costly nightmare.

Any viable cloud offering should also have the ability to suspend automated document retention/deletion rules to ensure the adequate preservation of relevant information.  This goes beyond placing a hold on archival data in the cloud.  It requires that an organization have the ability to identify the data sources in the cloud that may contain relevant information and then modify aspects of its retention policies to ensure that cloud-stored data is retained for eDiscovery.  Taking this step will enable an organization to create a defensible document retention strategy and be protected from court sanctions under the Federal Rule of Civil Procedure 37(e) “safe harbor.”  The decision from Viramontes v. U.S. Bancorp (N.D. Ill. Jan. 27, 2011) is particularly instructive on this issue.

In Viramontes, the defendant bank defeated a sanctions motion because it timely modified aspects of its email retention policy.  The bank implemented a policy that kept emails for 90 days, after which the emails were deleted.  That policy was promptly suspended, however, once litigation was reasonably foreseeable.  Because the bank followed that procedure in good faith, it was protected from sanctions under Rule 37(e).

As the Viramontes case shows, an organization can be prepared for eDiscovery disputes by appropriately suspending aspects of its document retention policies.  By creating and then faithfully observing a policy that requires retention policies be suspended on the occurrence of litigation or other triggering event, an organization can develop a defensible retention procedure. Having such eDiscovery functionality in a cloud provider will likely facilitate an organization’s eDiscovery process and better insulate it from litigation disasters.

The Ability to Issue Litigation Hold Notices

To be effective for eDiscovery purposes, a cloud service provider must also enable an organization to deploy a litigation hold to prevent users from destroying data. Unless the cloud has litigation hold technology, the entire discovery process may very well collapse.  For electronic data to be produced in litigation, it must first be preserved.  And it cannot be preserved if the key players or data source custodians are unaware that such information must be retained.  Indeed, employees and data sources may discard and overwrite electronically stored information if they are oblivious to a preservation duty.

A cloud service provider should therefore enable automated legal hold acknowledgements.  Such technology will allow custodians to be promptly and properly notified of litigation and thereby retain information that might otherwise have been discarded.  Inadequate litigation hold technology leaves organizations vulnerable to data loss and court punishment.

Conclusion

Confirming that a cloud offering can quickly retrieve and efficiently store enterprise data while effectively deploying litigation hold notices will likely address the basic concerns regarding its eDiscovery functionality. Yet these features alone will not make that solution the model of eDiscovery cloud providers. Advanced search capabilities should also be included to reduce the amount of data that must be analyzed and reviewed downstream. In addition, the cloud ought to support load files in compatible formats for export to third party review software. The cloud should additionally provide an organization with a clear audit trail establishing that neither its documents, nor their metadata were modified when transmitted to the cloud.  Without this assurance, an organization may not be able to comply with key regulations or establish the authenticity of its data in court. Finally, ensure that these provisions are memorialized in the service level agreement governing the relationship between the organization and the cloud provider.