24h-payday

Archive for the ‘review’ Category

Dueling Predictive Coding for Dummies Books Part Deux

Friday, December 7th, 2012

Long time readers of the eDiscovery 2.0 blog know we like to take advantage of every opportunity we have to discuss Charlie Sheen and eDiscovery.  While Charlie Sheen’s antics may have died down, the evolution and discussion of e-Discovery technology continues unabated. Thanks to Sharon Nelson and a recent blog post on her ride the lightning site, we’ve decided that there is no way we can pass up the opportunity to stretch the Charlie Sheen/eDiscovery analogy once again.

In the 1993 movie Hot Shots Part Deux, Charlie Sheen plays the main character in a Rambo parody that has similarities to the original Rambo movies starring Sylvester Stallone.  Not surprisingly, the parody is focused on comedic value and is a far cry from the original Rambo movies that helped make Stallone a Hollywood icon.  In recent months, those in the litigation community have watched an analogous situation play out with two competing books about predictive coding technology.

In September, the legal publication ALM (American Lawyer Media), reported that two competing Predictive Coding for Dummies books were published by Symantec and Recommind respectively. 

The ALM article, titled: Predictive Coding Vendors Duel for ‘Dummies’ did not provide an in depth analysis of either book, but a recent blog posted to ride the lightening by Terry Dexter provided an analysis and review of both books that many have eagerly anticipated.  The conclusion?  The Predictive Coding for Dummies sequel is a far cry from the original.

Here is the actual text of Mr. Dexter’s analysis for your reading pleasure:

Predictive Coding For Dummies®, Symantec(TM) Special Edition by Matthew D. Nelson, Esq. Copyright © 2012 from John Wiley & Sons, Inc. 111 River St. Hoboken, NJ 07030-5774 ISBN 978-1-118-48198-1 (pbk); ISBN 978-1-118-48237-7 (ebk)

Predictive Coding For Dummies®, Recommind Special Edition author(s) not listed, Copyright © 2013 from John Wiley & Sons, Inc. 111 River St.Hoboken, NJ 07030-5774 ISBN 978-1-118-52167-0 (pbk); ISBN 978-1-118-52230-1 (ebk)

Not being known as someone who won’t accept a challenge, I read both books cover to cover (several times).  In full disclosure, I am not an attorney (or played one on TV); I am simply a techno-geek with a Bachelor of Arts in English and strong interest in the tools, techniques and methods involved with electronic discovery (eDis). This review is based upon my reading and understanding of Predictive Coding, which, in turn, is based upon a combination of 30 years in Information Science & Technology and extensive research into the wild wooly world of electronic discovery. Any and all comments are mine and not that of Sharon Nelson (the individual) or Sensei Enterprises, Inc.

Up first: Predictive Coding For Dummies®, Symantec(TM) Special Edition by Matthew D. Nelson, Esq.

My initial impression of this book was good. The format follows the standard “Dummies” format and structure while legal and technical concepts are presented in a clear, easily understood manner. Nelson’s writing flows from one paragraph to another and doesn’t introduce new terms without first explaining them. The reader is immediately informed as to the what and why of electronic discovery.  From the third paragraph onward, the reader is gradually immersed into a sometimes murky world.

This excerpt from the Introduction sets the tone:

“Predictive coding technology is a new approach to attorney document review that can be used to help legal teams significantly reduce the time and cost of eDiscovery. Despite the promise of predictive coding technology, the technology is relatively new to the legal field, and significant confusion about the proper use of these tools is pervasive. This book helps eliminate that confusion by providing a wealth of information about predictive coding technology, related terminology, and the proper use of these tools.”

Specific comments:

Beyond the excellent writing, this book contains many positives and negatives; some of which I present here.

Positives:

  1. The cost in terms of timeliness, accuracy and productivity is compared to manual review. 
  2. Nelson introduces the Electronic Discovery Reference Model (EDRM) within the first three (3) pages. The subsequent discussions regarding potential costs is emphasized by illustrating the enormity of the potential volume of Electronically Stored Information (ESI). This early introduction is also valuable when process defensibility is introduced.
  3. The concepts of sanctions, privileged information, human v machine reading/review and risk are easily distinguished. Again, the “whys” for such concepts, easily understandable to a First Year Law Student, are easily understood for the layperson.
  4. The inclusion of website addresses to provide additional information is most welcome. Indeed, references to a predictive coding cost estimate page and to a Ralph Losey article helped me gain a deeper understanding of the planning and execution of a PC effort.
  5. A separate step in Nelson’s work flow considers Privileged Information. While no one on either side of a litigation struggle want to divulge such data, it can and does happen. Predictive Coding is not presented as a palliative cure-all for such ‘ooopsies’; however the book does go far in helping the reader comprehend the necessity of conducting separate actions to reduce if not eliminate the probability of such an event occurring.
  6. The three prominent eDis cases (DaSilva-Moore, Kleen Products and Global  Aerospace) are discussed relative to First Generation PC tools and Judicial Guidance.

Negatives:

  1. Clearly, this book is written and produced to influence litigators and law firms to orient themselves towards Symantec and Clearwell. Hints are subtly placed throughout the book. While not explicitly mentioning any names, the implication is clear and gets more obvious starting at Chapter 6. A more neutral, objective content makes more sense to someone who is already familiar with the eDis process.
  2. There is no discussion on the difficulties of using Optical Character Recognition (OCR) or different character set based ESI. All data is presumed to be 100% compatible and ANSI compliant.

Recommendation:

This is an excellent book to give to clients, new litigation support personnel, paralegals, etc. involved in the beginnings of any litigation where the use of Electronic Discovery tools is likely.

Next up: Predictive Coding For Dummies®, Recommind Special Edition author(s) not listed

My initial impression was guarded. The format follows the standard “Dummies” format and structure but the content reads like someone mashed several marketing ‘White Papers’ together. This impression is further supported when comparing copyright dates with the Symantec book. Indeed, a stark comparison between these tomes is like comparing apples to oranges.

Positives:

  1. It’s short.

Negatives:

  1. Only nine (9) pages (25%) have any direct relationship with the subject matter. Twenty-eight (28) pages (~77%) are more closely related to marketing collateral. The very topic of Predictive Coding is introduced to the reader at page 11!
  2. The reader is constantly bombarded with the cost differential between manual and automated document review. Figure 1-2 in this book compares savings in 3 types of cases (IP, Second Request & Tort). Linear (Manual) Review is compared to Predictive Coding and, of course, PC wins every time. However there is no mention as to the style of the PC effort (and related costs) – were documents reviewed in house or by a services provider?
  3. There is zero mention of risk, sanctions or privileged information. In fact, a reader may develop the idea that any Predictive Coding tool takes care of any such occurrences.
  4. There is no discussion on the difficulties of using Optical Character Recognition (OCR) or different character set based ESI. All ESI is presumed to be 100% compatible and ANSI compliant.
  5. What are ‘Frankenstacks’? This book is supposed to help IT Managers who already understand the hurdles of application incompatibility.
  6. The book is very difficult to read. The workflow discussion does not follow the accompanying diagram (Figure 2-1) and even introduces the concept of ‘Predictive Analysis’ without any further discussion.
  7. The book makes blatant reference to Recommind’s product. Indeed the content of the entire document builds to the conclusion that only Recommind has the capability to successfully conduct electronic discovery.

Recommendation:

This is a very poorly written book using a style that insults the reader’s intelligence. A cursory Bing or Google search would a better investment in time and money.”

Interestingly, only one day after Mr. Dexter’s review, another review by Jeffrey Reed was posted to ride the lightning criticizing both books. For those of us in a profession that thrives on advocacy, it probably comes as no surprise that two people could have different views of the same book. Unfortunately, inconsistent reviews might leave some to wonder which book they should read.  The good news is that both books are free so we invite you to read them both and draw your own conclusions.  As always, we also invite your feedback.

To download a copy of Symantec’s Predictive Coding for Dummies book click here.

New Gartner Report Spotlights Significance of Email Archiving for Defensible Deletion

Thursday, November 1st, 2012

Gartner recently released a report that spotlights the importance of using email archiving as part of an organization’s defensible deletion strategy. The report – Best Practices for Using Email Archiving to Eliminate PST and Mailbox Quota Headaches (Alan Dayley, September 21, 2012) – specifically focuses on the information retention and eDiscovery challenges associated with email storage on Microsoft Exchange and how email archiving software can help address these issues. As Gartner makes clear in its report, an archiving solution can provide genuine opportunities to reduce the costs and risks of email hoarding.

The Problem: PST Files

The primary challenge that many organizations are experiencing with Microsoft Exchange email is the unchecked growth of messages stored in portable storage tablet (PST) files. Used to bypass storage quotas on Exchange, PST files are problematic because they increase the costs and risks of eDiscovery while circumventing information retention policies.

That the unrestrained growth of PST files could create problems downstream for organizations should come as no surprise. Various court decisions have addressed this issue, with the DuPont v. Kolon Industries litigation foremost among them. In the DuPont case, a $919 million verdict and 20 year product injunction largely stemmed from the defendant’s inability to prevent the destruction of thousands pages of email formerly stored in PST files. That spoliation resulted in a negative inference instruction to the jury and the ensuing verdict against the defendant.

The Solution: Eradicate PSTs with the Help of Archiving Software and Retention Policies

To address the PST problem, Gartner suggests following a three-step process to help manage and then eradicate PSTs from the organization. This includes educating end users regarding both the perils of PSTs and the ease of access to email through archiving software. It also involves disabling the creation of new PSTs, a process that should ultimately culminate with the elimination of existing PSTs.

In connection with this process, Gartner suggests deployment of archiving software with a “PST management tool” to facilitate the eradication process. With the assistance of the archiving tool, existing PSTs can be discovered and migrated into the archive’s central data repository. Once there, email retention policies can begin to expire stale, useless and even harmful messages that were formerly outside the company’s information retention framework.

With respect to the development of retention policies, organizations should consider engaging in a cooperative internal process involving IT, compliance, legal and business units. These key stakeholders must be engaged and collaborate if a workable policies are to be created. The actual retention periods should take into account the types of email generated and received by an organization, along with the enterprise’s business, industry and litigation profile.

To ensure successful implementation of such retention policies and also address the problem of PSTs, an organization should explore whether an on premise or cloud archiving solution is a better fit for its environment. While each method has its advantages, Gartner advises organizations to consider whether certain key features are included with a particular offering:

Email classification. The archiving tool should allow your organization to classify and tag the emails in accordance with your retention policy definitions, including user-selected, user/group, or key-word tagging.

User access to archived email. The tool must also give end users appropriate and user-friendly access to their archived email, thus eliminating concerns over their inability to manage their email storage with PSTs.

Legal and information discovery capabilities. The search, indexing, and e-discovery capabilities of the archiving tool should also match your needs or enable integration into corporate e-discovery systems.

While perhaps not a panacea for the storage and eDiscovery problems associated with email, on premise or cloud archiving software should provide various benefits to organizations. Indeed, such technologies have the potential to help organizations store, manage and discover their email efficiently, cost effectively and in a defensible manner. Where properly deployed and fully implemented, organizations should be able to reduce the nettlesome costs and risks connected with email.

Federal Directive Hits Two Birds (RIM and eDiscovery) with One Stone

Thursday, October 18th, 2012

The eagerly awaited Directive from The Office of Management and Budget (OMB) and The National Archives and Records Administration (NARA) was released at the end of August. In an attempt to go behind the scenes, we’ve asked the Project Management Office (PMO) and the Chief Records Officer for the NARA to respond to a few key questions. 

We know that the Presidential Mandate was the impetus for the agency self-assessments that were submitted to NARA. Now that NARA and the OMB have distilled those reports, what are the biggest challenges on a go forward basis for the government regarding record keeping, information governance and eDiscovery?

“In each of those areas, the biggest challenge that can be identified is the rapid emergence and deployment of technology. Technology has changed the way Federal agencies carry out their missions and create the records required to document that activity. It has also changed the dynamics in records management. In the past, agencies would maintain central file rooms where records were stored and managed. Now, with distributed computing networks, records are likely to be in a multitude of electronic formats, on a variety of servers, and exist as multiple copies. Records management practices need to move forward to solve that challenge. If done right, good records management (especially of electronic records) can also be of great help in providing a solid foundation for applying best practices in other areas, including in eDiscovery, FOIA, as well as in all aspects of information governance.”    

What is the biggest action item from the Directive for agencies to take away?

“The Directive creates a framework for records management in the 21st century that emphasizes the primacy of electronic information and directs agencies to being transforming their current process to identify and capture electronic records. One milestone is that by 2016, agencies must be managing their email in an electronically accessible format (with tools that make this possible, not printing out emails to paper). Agencies should begin planning for the transition, where appropriate, from paper-based records management process to those that preserve records in an electronic format.

The Directive also calls on agencies to designate a Senior Agency Official (SAO) for Records Management by November 15, 2012. The SAO is intended to raise the profile of records management in an agency to ensure that each agency commits the resources necessary to carry out the rest of the goals in the Directive. A meeting of SAOs is to be held at the National Archives with the Archivist of the United States convening the meeting by the end of this year. Details about that meeting will be distributed by NARA soon.”

Does the Directive holistically address information governance for the agencies, or is it likely that agencies will continue to deploy different technology even within their own departments?

“In general, as long as agencies are properly managing their records, it does not matter what technologies they are using. However, one of the drivers behind the issuance of the Memorandum and the Directive was identifying ways in which agencies can reduce costs while still meeting all of their records management requirements. The Directive specifies actions (see A3, A4, A5, and B2) in which NARA and agencies can work together to identify effective solutions that can be shared.”

Finally, although FOIA requests have increased and the backlog has decreased, how will litigation and FOIA intersecting in the next say 5 years?  We know from the retracted decision in NDLON that metadata still remains an issue for the government…are we getting to a point where records created electronically will be able to be produced electronically as a matter of course for FOIA litigation/requests?

“In general, an important feature of the Directive is that the Federal government’s record information – most of which is in electronic format – stays in electronic format. Therefore, all of the inherent benefits will remain as well – i.e., metadata being retained, easier and speedier searches to locate records, and efficiencies in compilation, reproduction, transmission, and reduction in the cost of producing the requested information. This all would be expected to have an impact in improving the ability of federal agencies to respond to FOIA requests by producing records in electronic formats.”

Fun Fact- Is NARA really saving every tweet produced?

“Actually, the Library of Congress is the agency that is preserving Twitter. NARA is interested in only preserving those tweets that a) were made or received in the course of government business and b) appraised to have permanent value. We talked about this on our Records Express blog.”

“We think President Barack Obama said it best when he made the following comment on November 28, 2011:

“The current federal records management system is based on an outdated approach involving paper and filing cabinets. Today’s action will move the process into the digital age so the American public can have access to clear and accurate information about the decisions and actions of the Federal Government.” Paul Wester, Chief Records Officer at the National Archives, has stated that this Directive is very exciting for the Federal Records Management community.  In our lifetime none of us has experienced the attention to the challenges that we encounter every day in managing our records management programs like we are now. These are very exciting times to be a records manager in the Federal government. Full implementation of the Directive by the end of this decade will take a lot of hard work, but the government will be better off for doing this and we will be better able to serve the public.”

Special thanks to NARA for the ongoing dialogue that is key to transparent government and the effective practice of eDiscovery, Freedom Of Information Act requests, records management and thought leadership in the government sector. Stay tuned as we continue to cover these crucial issues for the government as they wrestle with important information governance challenges. 

 

Defensible Deletion: The Cornerstone of Intelligent Information Governance

Tuesday, October 16th, 2012

The struggle to stay above the rising tide of information is a constant battle for organizations. Not only are the costs and logistics associated with data storage more troubling than ever, but so are the potential legal consequences. Indeed, the news headlines are constantly filled with horror stories of jury verdicts, court judgments and unreasonable settlements involving organizations that failed to effectively address their data stockpiles.

While there are no quick or easy solutions to these problems, an ever increasing method for effectively dealing with these issues is through an organizational strategy referred to as defensible deletion. A defensible deletion strategy could refer to many items. But at its core, defensible deletion is a comprehensive approach that companies implement to reduce the storage costs and legal risks associated with the retention of electronically stored information (ESI). Organizations that have done so have been successful in avoiding court sanctions while at the same time eliminating ESI that has little or no business value.

The first step to implementing a defensible deletion strategy is for organizations to ensure that they have a top-down plan for addressing data retention. This typically requires that their information governance principals – legal and IT – are cooperating with each other. These departments must also work jointly with records managers and business units to decide what data must be kept and for what length of time. All such stakeholders in information retention must be engaged and collaborate if the organization is to create a workable defensible deletion strategy.

Cooperation between legal and IT naturally leads the organization to establish records retention policies, which carry out the key players’ decisions on data preservation. Such policies should address the particular needs of an organization while balancing them against litigation requirements. Not only will that enable a company to reduce its costs by decreasing data proliferation, it will minimize a company’s litigation risks by allowing it to limit the amount of potentially relevant information available for current and follow-on litigation.

In like manner, legal should work with IT to develop a process for how the organization will address document preservation during litigation. This will likely involve the designation of officials who are responsible for issuing a timely and comprehensive litigation hold to custodians and data sources. This will ultimately help an organization avoid the mistakes that often plague document management during litigation.

The Role of Technology in Defensible Deletion

In the digital age, an essential aspect of a defensible deletion strategy is technology. Indeed, without innovations such as archiving software and automated legal hold acknowledgements, it will be difficult for an organization to achieve its defensible deletion objectives.

On the information management side of defensible deletion, archiving software can help enforce organization retention policies and thereby reduce data volume and related storage costs. This can be accomplished with classification tools, which intelligently analyze and tag data content as it is ingested into the archive. By so doing, organizations may retain information that is significant or that otherwise must be kept for business, legal or regulatory purposes – and nothing else.

An archiving solution can also reduce costs through efficient data storage. By expiring data in accordance with organization retention policies and by using single instance storage to eliminate ESI duplicates, archiving software frees up space on company servers for the retention of other materials and ultimately leads to decreased storage costs. Moreover, it also lessens litigation risks as it removes data available for future litigation.

On the eDiscovery side of defensible deletion, an eDiscovery platform with the latest in legal hold technology is often essential for enabling a workable litigation hold process. Effective platforms enable automated legal hold acknowledgements on various custodians across multiple cases. This allows organizations to confidently place data on hold through a single user action and eliminates concerns that ESI may slip through the proverbial cracks of manual hold practices.

Organizations are experiencing every day the costly mistakes of delaying implementation of a defensible deletion program. This trend can be reversed through a common sense defensible deletion strategy which, when powered by effective, enabling technologies, can help organizations decrease the costs and risks associated with the information explosion.

Responsible Data Citizens Embrace Old World Archiving With New Data Sources

Monday, October 8th, 2012

The times are changing rapidly as data explosion mushrooms, but the more things change the more they stay the same. In the archiving and eDiscovery world, organizations are increasingly pushing content from multiple data sources into information archives. Email was the first data source to take the plunge into the archive, but other data sources are following quickly as we increase the amount of data we create (volume) along with the types of data sources (variety). While email is still a paramount data source for litigation, internal/external investigations and compliance – other data sources, namely social media and SharePoint, are quickly catching up.  

This transformation is happening for multiple reasons. The main reason for this expansive push of different data varieties into the archive is because centralizing an organization’s data is paramount to healthy information governance. For organizations that have deployed archiving and eDiscovery technologies, the ability to archive multiple data sources is the Shangri-La they have been looking for to increase efficiency, as well as create a more holistic and defensible workflow.

Organizations can now deploy document retention policies across multiple content types within one archive and can identify, preserve and collect from the same, singular repository. No longer do separate retention policies need to apply to data that originated in different repositories. The increased ability to archive more data sources into a centralized archive provides for unparalleled storage, deduplication, document retention, defensible deletion and discovery benefits in an increasingly complex data environment.

Prior to this capability, SharePoint was another data source in the wild that needed disparate treatment. This meant that legal hold in-place, as well as insight into the corpus of data, was not as clear as it was for email. This lack of transparency within the organization’s data environment for early case assessment led to unnecessary outsourcing, over collection and disparate time consuming workflows. All of the aforementioned detractors cost organizations money, resources and time that can be better utilized elsewhere.

Bringing data sources like SharePoint into an information archive increases the ability for an organization to comply with necessary document retention schedules, legal hold requirements, and the ability to reap the benefits of a comprehensive information governance program. If SharePoint is where an organization’s employees are storing documents that are valuable to the business, order needs to be brought to the repository.

Additionally, many projects are abandoned and left to die on the vine in SharePoint. These projects need to be expired and that capacity must be recycled for a higher business purpose. Archives currently enable document libraries, wikis, discussion boards, custom lists, “My Sites” and SharePoint social content for increased storage optimization, retention/expiration of content and eDiscovery. As a result, organizations can better manage complex projects such as migrations, versioning, site consolidations and expiration with SharePoint archiving.  

Data can be analogized to a currency, where the archive is the bank. In treating data as a currency, organizations must ask themselves: why are companies valued the way they are on Wall Street? For companies that perform service or services in combination with products, they are valued many times on customer lists, data to be repurposed about consumers (Facebook), and various other databases. A recent Forbes article discusses people, value and brand as predominant indicators of value.

While these valuation metrics are sound, the valuation stops short of measuring the quality of the actual data within an organization, examining if it is organized and protected. The valuation also does not consider the risks of and benefits of how the data is stored, protected and whether or not it is searchable. The value of the data inside a company is what supports all three of the aforementioned valuations without exception. Without managing the data in an organization, not only are eDiscovery and storage costs a legal and financial risk, the aforementioned three are compromised.

If employee data is not managed/monitored appropriately, if the brand is compromised due to lack of social media monitoring/response, or if litigation ensues without the proper information governance plan, then value is lost because value has not been assessed and managed. Ultimately, an organization is only as good as its data, and this means there’s a new asset on Wall Street – data.

It’s not a new concept to archive email,  and in turn it isn’t novel that data is an asset. It has just been a less understood asset because even though massive amounts of data are created each day in organizations, storage has become cheap. SharePoint is becoming more archivable because more critical data is being stored there, including business records, contracts and social media content. Organizations cannot fear what they cannot see until they are forced by an event to go back and collect, analyze and review that data. Costs associated with this reactive eDiscovery process can range from $3,000-30,000 a gigabyte, compared to the 20 cents per gigabyte for storage. The downstream eDiscovery costs are obviously costly, especially as organizations begin to deal in terabytes and zettabytes. 

Hence, plus ca change, plus c’est le meme chose and we will see this trend continue as organizations push more valuable data into the archive and expire data that has no value. Multiple data sources have been collection sources for some time, but the ease of pulling everything into an archive is allowing for economies of scale and increased defensibility regarding data management. This will decrease the risks associated with litigation and compliance, as well as boost the value of companies.

Avoiding “American Style” eDiscovery Traps Abroad

Tuesday, September 18th, 2012

The eDiscovery craze that has gripped the U.S. legal system over the past decade has been ignored in much of Europe. This is due in large part to Europe’s discovery rules, which generally forbid categorical document requests and other broad discovery procedures authorized in the U.S. by the Federal Rules of Civil Procedure (FRCP). Without the impetus created by eDiscovery demands, many companies operating in Europe may feel insulated from the need to implement an eDiscovery-oriented defensible deletion strategy. Yet the need for such a strategy – focused on reducing the costs and legal risks of storing electronic data – is universal.

With the opportunity to decrease storage costs and inefficiencies, having a laissez-faire attitude toward defensible deletion is troubling enough. But taking no action to prepare for eDiscovery is worse. That is especially the case for organizations operating in Europe that maintain offices in the United States. They may fall into an increasingly popular “American style” eDiscovery trap for unsuspecting litigants in European legal proceedings. That discovery trap is 28 U.S.C. §1782.

Section 1782 is a U.S. federal statute that enables companies involved in foreign legal proceedings to obtain discovery from parties or non-parties who have offices in the United States. A petitioning litigant under section 1782 need only make a minor threshold showing to a U.S. district court to justify the discovery. The petitioner must establish that the discovery is being sought from a person under its jurisdiction and will be used in foreign legal proceedings. In addition, the requested discovery must not offend the foreign tribunal or otherwise abuse the purposes of section 1782. This typically requires a showing that the discovery cannot be obtained from the foreign tribunal and is not an end-run around the tribunal’s rules or its country’s policies. As the U.S. Court of Appeal for the Seventh Circuit recently confirmed in Heraeus Kulzer GmbH v. Biomet, Inc., the successful petitioner “can obtain as much discovery as it could if the lawsuit had been brought in [the United States] rather than abroad.”

Section 1782 requests are on the rise, as evidenced by recent decisions issued by three separate U.S. Courts of Appeals. In Government of Ghana v. ProEnergy Services, 677 F.3d 340 (8th Cir. 2012), the plaintiff successfully used a section 1782 request to obtain documents, interrogatory answers and deposition transcripts in connection with parallel proceedings in The Hague and in Ghana. This was followed by an Eleventh Circuit order from June 2012, which allowed the defendant to an Ecuadorian arbitration to obtain documents under section 1782 from the plaintiff’s American subsidiary. And earlier this month, the Fifth Circuit revived a section 1782 petition in which a company sought documents and testimony to support its positions in an action before the High Court in the United Kingdom.

As the foregoing section 1782 jurisprudence teaches, unsuspecting enterprises can be exposed to the full spectrum of American eDiscovery headaches even in foreign legal proceedings. As a result, organizations should be prepared to address foreign eDiscovery matters in the same manner required for American litigation. This will likely include the development of a defensible deletion strategy as an essential element of the enterprise’s overall information governance plan. With defensible deletion, organizations can use effective, enabling technologies to limit the amount of potentially relevant information available for future litigation. The reduced data set will decrease storage costs in the short term and minimize legal risks over the long term. And when deployed in conjunction with an integrated approach to eDiscovery, defensible deletion can help organizations lower the costs of document review. Such an approach will likely facilitate the organization’s ability to prepare for discovery traps – American or otherwise.

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

Falcon Discovery Ushers in Savings with Transparent Predictive Coding

Tuesday, September 4th, 2012

The introduction of Transparent Predictive Coding to Symantec’s Clearwell eDiscovery Platform helps organizations defensibly reduce the time and cost of document review. Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input. As expert reviewers tag documents in a training set, the software identifies common criteria across those documents, which it uses to “predict” the responsiveness of the remaining case documents. The result is that fewer irrelevant and non-responsive documents need to be reviewed manually – thereby accelerating the review process, increasing accuracy and allowing organizations to reduce the time and money spent on traditional page-by-page attorney document review.

Given the cost, speed and accuracy improvements that predictive coding promises, its adoption may seem to be a no-brainer. Yet predictive coding technology hasn’t been widely adopted in eDiscovery – largely because the technology and process itself still seems opaque and complex. Symantec’s Transparent Predictive Coding was developed to address these concerns and provide the level of defensibility necessary to enable legal teams to adopt predictive coding as a mainstream technology for eDiscovery review. Transparent Predictive Coding provides reviewers with complete visibility into the training and prediction process and delivers context for more informed, defensible decision-making.

Early adopters like Falcon Discovery have already witnessed the benefits of Transparent Predictive Coding. Falcon is a managed services provider that leverages a mix of top legal talent and cutting-edge technologies to help corporate legal departments, and the law firms that serve them, manage discovery and compliance challenges across matters. Recently, we spoke with Don McLaughlin, founder and CEO of Falcon Discovery, on the firm’s experiences with and lessons learned from using Transparent Predictive Coding.

1. Why did Falcon Discovery decide to evaluate Transparent Predictive Coding?

Predictive coding is obviously an exciting development for the eDiscovery industry, and we want to be able to offer Falcon’s clients the time and cost savings that it can deliver. At the same time there is an element of risk. For example, not all solutions provide the same level of visibility into the prediction process, and helping our clients manage eDiscovery in a defensible manner is of paramount importance. Over the past several years we have tested and/or used a number of different software solutions that include some assisted review or prediction technology. We were impressed that Symantec has taken the time and put in the research to integrate best practices into its predictive coding technology. This includes elements like integrated, dynamic statistical sampling, which takes the guesswork out of measuring review accuracy. This ability to look at accuracy across the entire review set provides a more complete picture, and helps address key issues that have come to light in some of the recent predictive coding court cases like Da Silva Moore.

2. What’s something you found unique or different from other solutions you evaluated?

I would say one of the biggest differentiators is that Transparent Predictive Coding uses both content and metadata in its algorithms to capture the full context of an e-mail or document, which we found to be appealing for two reasons. First, you often have to consider metadata during review for sensitive issues like privilege and to focus on important communications between specific individuals during specific time periods. Second, this can yield more accurate results with less work because the software has a more complete picture of the important elements in an e-mail or document. This faster time to evaluate the documents is critical for our clients’ bottom line, and enables more effective litigation risk analysis, while minimizing the chance of overlooking privileged or responsive documents.

3. So what were some of the success metrics that you logged?

Using Transparent Predictive Coding, Falcon was able to achieve extremely high levels of review accuracy with only a fraction of the time and review effort. If you look at academic studies on linear search and review, even under ideal conditions you often get somewhere between 40-60% accuracy. With Transparent Predictive Coding we are seeing accuracy measures closer to 90%, which means we are often achieving 90% recall and 80% precision by reviewing only a small fraction – under 10% – of the data population that you might otherwise review document-by-document. For the appropriate case and population of documents, this enables us to cut review time and costs by 90% compared to pure linear review. Of course, this is on top of the significant savings derived from leveraging other technologies to intelligently cull the data to a more relevant review set, prior to even using Transparent Predictive Coding. This means that our clients can understand the key issues, and identify potentially ‘smoking gun’ material, much earlier in a case.

4. How do you anticipate using this technology for Falcon’s clients?

I think it’s easy for people to get swept up by the “latest and greatest” technology or gadget and assume this is the silver bullet for everything we’ve been toiling over before. Take, for example, the smartphone camera – great for a lot of (maybe even most) situations, but sometimes you’re going to want that super zoom lens or even (gasp!) regular film. By the same token, it’s important to recognize that predictive coding is not an across-the-board substitute for other important eDiscovery review technologies and targeted manual review. That said, we’ve leveraged Clearwell to help our clients lower the time and costs of the eDiscovery process on hundreds of cases now, and one of the main benefits is that the solution offers the flexibility of using any number of advanced analytics tools to meet the specific requirements of the case at hand. We’re obviously excited to be able to introduce our clients to this predictive coding technology – and the time and cost benefits it can deliver – but this is in addition to other Clearwell tools, like advanced keyword search, concept or topic clustering, domain filtering, discussion threading and so on, that can and should be used together with predictive coding.

5. Based on your experience, do you have advice for others who may be looking to defensibly reduce the time and cost of document review with predictive coding technology?

The goal of the eDiscovery process is not perfection. At the end of the day, whether you employ a linear review approach and/or leverage predictive coding technology, you need to be able to show that what you did was reasonable and achieved an acceptable level of recall and precision. One of the things you notice with predictive coding is that as you review more documents, the recall and precision scores go up but at a decreasing rate. A key element of a reasonable approach to predictive coding is measuring your review accuracy using a proven statistical sampling methodology. This includes measuring recall and precision accurately to ensure the predictive coding technology is performing as expected. We’re excited to be able to deliver this capability to our clients out of the box with Clearwell, so they can make more informed decisions about their cases early-on and when necessary address concerns of proportionality with opposing parties and the court.

To find out more about Transparent Predictive Coding, visit http://go.symantec.com/predictive-coding

Mission Impossible? The eDiscovery Implications of the ABA’s New Ethics Rules

Thursday, August 30th, 2012

The American Bar Association (ABA) recently announced changes to its Model Rules of Professional Conduct that are designed to address digital age challenges associated with practicing law in the 21st century. These changes emphasize that lawyers must understand the ins and outs of technology in order to provide competent representation to their clients. From an eDiscovery perspective, such a declaration is particularly important given the lack of understanding that many lawyers have regarding even the most basic supporting technology needed to effectively satisfy their discovery obligations.

With respect to the actual changes, the amendment to the commentary language from Model Rule 1.1 was most significant for eDiscovery purposes. That rule, which defines a lawyer’s duty of competence, now requires that attorneys discharge that duty with an understanding of the “benefits and risks” of technology:

To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject.

This rule certainly restates the obvious for experienced eDiscovery counsel. Indeed, the Zubulake series of opinions from nearly a decade ago laid the groundwork for establishing that competence and technology are irrevocably and inextricably intertwined. As Judge Scheindlin observed in Zubulake V, “counsel has a duty to effectively communicate to her client its discovery obligations so that all relevant information is discovered, retained, and produced.” This includes being familiar with client retention policies, in addition to its “data retention architecture;” communicating with the “client’s information technology personnel” and arranging for the “segregation and safeguarding of any archival media (e.g., backup tapes) that the party has a duty to preserve.”

Nevertheless, Model Rule 1.1 is groundbreaking in that it formally requires lawyers in those jurisdictions following the Model Rules to be up to speed on the impact of eDiscovery technology. In 2012, that undoubtedly means counsel should become familiar with the benefits and risks of predictive coding technology. With its promise of reduced document review costs and decreased legal fees, counsel should closely examine predictive coding solutions to determine whether they might be deployed in some phase of the document review process (e.g., prioritization, quality assurance for linear review, full scale production). Yet caution should also be exercised given the risks associated with this technology, particularly the well-known limitations of early generation predictive coding tools.

In addition to predictive coding, lawyers would be well served to better understand traditional eDiscovery technology tools such as keyword search, concept search, email threading and data clustering. Indeed, there is significant confusion regarding the continued viability of keyword searching given some prominent judicial opinions frowning on so-called blind keyword searches. However, most eDiscovery jurisprudence and authoritative commentators confirm the effectiveness of keyword searches that involve some combination of testing, sampling and iterative feedback.

Whether the technology involves predictive coding, keyword searching, attorney client privilege reviews or other areas of eDiscovery, the revised Model Rules appear to require counsel to understand the benefits and risks of these tools. Moreover, this is not simply a one-time directive. Because technology is always changing, lawyers should continue to stay abreast of changes and developments. This continuing duty of competence is well summarized in The Sedona Conference Best Practices Commentary on the Use of Search & Retrieval Methods in E-Discovery:

Parties and the courts should be alert to new and evolving search and information retrieval methods. What constitutes a reasonable search and information retrieval method is subject to change, given the rapid evolution of technology. The legal community needs to be vigilant in examining new and emerging techniques and methods which claim to yield better search results.

While the challenge of staying abreast of these complex technological changes is difficult, it is certainly not “mission impossible.” Lawyers untrained in the areas of technology have often developed tremendous skill sets required for dealing with other areas of complexities in the law. Perhaps the wise but encouraging reminder from Anthony Hopkins to Tom Cruise in Mission Impossible II will likewise spur reluctant attorneys to accept this difficult, though not impossible task: “Well this is not Mission Difficult, Mr. Hunt, it’s Mission Impossible. Difficult should be a walk in the park for you.”

Clean Sweep in Kleen Products Predictive Coding Battle? Not Exactly

Friday, August 24th, 2012

The tears of sadness shed by those in the eDiscovery community lamenting the end of the predictive coding debate in Kleen Products may turn to tears of joy when they realize that the debate could resurface next year. Despite early reports, the Plaintiffs in Kleen did not completely roll over on their argument that defendants should be required to use what they characterize as “Content Based Advanced Analytics” (“CBAA”). To the contrary, Plaintiffs preserved their right to meet and confer with Defendants about future document productions after October 1, 2013. Not surprisingly, future document productions could rekindle the fiery debate about the use of predictive coding technology.

The controversy surrounding Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. was sparked earlier this year when Plaintiffs asked Judge Nolan to order Defendants to redo their previous productions and all future productions using CBAA. Among other things, Plaintiffs claimed that if Defendants had used “CBAA” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. In June, I reported hearing transcripts indicated that 7th Circuit Magistrate Judge Nan Nolan was urging the parties to focus on developing a mutually agreeable keyword approach to eDiscovery instead of debating whether other search and review methodologies would yield better results. This nudging by Judge Nolan was not surprising considering at least some of the defendants had already spent considerable time and money managing the document production process using more traditional tools other than predictive coding.

In a new twist, reports from other sources surfaced recently, suggesting that the Plaintiffs in Kleen decided to completely withdraw their demands that Defendants use predictive coding during discovery. The news likely disappointed many in the electronic discovery space poised to witness a third round of expert testimony pitting more traditional eDiscovery approaches against predictive coding technology. However, any such disappointment is premature because those dreaming of an eDiscovery showdown in Kleen could still see their dreams come true next year.

On August 21, Judge Nolan did indeed sign a joint “Stipulation and Order Relating to ESI Search.” However, in the order the Plaintiffs withdrew “their demand that defendants apply CBAA to documents contained in the First Request Corpus (emphasis added).” Plaintiffs go on to stipulate that they will not “argue or contend that defendants should be required to use or apply the types of CBAA or “predictive coding” methodology… with respect to any requests for production served on any defendant prior to October 1, 2013 (emphasis added).” Importantly, the Plaintiffs preserved their right to meet and confer regarding the appropriate search methodology to be used for future collections if discovery continues past October of next year.

Considering the parties have only scratched the surface of discovery thus far, the likelihood that the predictive coding issue will resurface again is high unless settlement is reached or Defendants have a change of heart. In short, the door is still wide open for Plaintiffs to argue that Defendants should be required to use predictive coding technology to manage future productions, and rumors about the complete demise of predictive coding in the Kleen Products case have been exaggerated.