24h-payday

Posts Tagged ‘production’

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

Mission Impossible? The eDiscovery Implications of the ABA’s New Ethics Rules

Thursday, August 30th, 2012

The American Bar Association (ABA) recently announced changes to its Model Rules of Professional Conduct that are designed to address digital age challenges associated with practicing law in the 21st century. These changes emphasize that lawyers must understand the ins and outs of technology in order to provide competent representation to their clients. From an eDiscovery perspective, such a declaration is particularly important given the lack of understanding that many lawyers have regarding even the most basic supporting technology needed to effectively satisfy their discovery obligations.

With respect to the actual changes, the amendment to the commentary language from Model Rule 1.1 was most significant for eDiscovery purposes. That rule, which defines a lawyer’s duty of competence, now requires that attorneys discharge that duty with an understanding of the “benefits and risks” of technology:

To maintain the requisite knowledge and skill, a lawyer should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology, engage in continuing study and education and comply with all continuing legal education requirements to which the lawyer is subject.

This rule certainly restates the obvious for experienced eDiscovery counsel. Indeed, the Zubulake series of opinions from nearly a decade ago laid the groundwork for establishing that competence and technology are irrevocably and inextricably intertwined. As Judge Scheindlin observed in Zubulake V, “counsel has a duty to effectively communicate to her client its discovery obligations so that all relevant information is discovered, retained, and produced.” This includes being familiar with client retention policies, in addition to its “data retention architecture;” communicating with the “client’s information technology personnel” and arranging for the “segregation and safeguarding of any archival media (e.g., backup tapes) that the party has a duty to preserve.”

Nevertheless, Model Rule 1.1 is groundbreaking in that it formally requires lawyers in those jurisdictions following the Model Rules to be up to speed on the impact of eDiscovery technology. In 2012, that undoubtedly means counsel should become familiar with the benefits and risks of predictive coding technology. With its promise of reduced document review costs and decreased legal fees, counsel should closely examine predictive coding solutions to determine whether they might be deployed in some phase of the document review process (e.g., prioritization, quality assurance for linear review, full scale production). Yet caution should also be exercised given the risks associated with this technology, particularly the well-known limitations of early generation predictive coding tools.

In addition to predictive coding, lawyers would be well served to better understand traditional eDiscovery technology tools such as keyword search, concept search, email threading and data clustering. Indeed, there is significant confusion regarding the continued viability of keyword searching given some prominent judicial opinions frowning on so-called blind keyword searches. However, most eDiscovery jurisprudence and authoritative commentators confirm the effectiveness of keyword searches that involve some combination of testing, sampling and iterative feedback.

Whether the technology involves predictive coding, keyword searching, attorney client privilege reviews or other areas of eDiscovery, the revised Model Rules appear to require counsel to understand the benefits and risks of these tools. Moreover, this is not simply a one-time directive. Because technology is always changing, lawyers should continue to stay abreast of changes and developments. This continuing duty of competence is well summarized in The Sedona Conference Best Practices Commentary on the Use of Search & Retrieval Methods in E-Discovery:

Parties and the courts should be alert to new and evolving search and information retrieval methods. What constitutes a reasonable search and information retrieval method is subject to change, given the rapid evolution of technology. The legal community needs to be vigilant in examining new and emerging techniques and methods which claim to yield better search results.

While the challenge of staying abreast of these complex technological changes is difficult, it is certainly not “mission impossible.” Lawyers untrained in the areas of technology have often developed tremendous skill sets required for dealing with other areas of complexities in the law. Perhaps the wise but encouraging reminder from Anthony Hopkins to Tom Cruise in Mission Impossible II will likewise spur reluctant attorneys to accept this difficult, though not impossible task: “Well this is not Mission Difficult, Mr. Hunt, it’s Mission Impossible. Difficult should be a walk in the park for you.”

Conducting eDiscovery in Glass Houses: Are You Prepared for the Next Stone?

Monday, August 27th, 2012

Electronic discovery has been called many names over the years. “Expensive,” “burdensome” and “endless” are just a few of the adjectives that, rightly or wrongly, characterize this relatively new process. Yet a more fitting description may be that of a glass house since the rights and responsibilities of eDiscovery inure to all parties involved in litigation. Indeed, like those who live in glass houses, organizations must be prepared for eDiscovery stones that will undoubtedly be thrown their way during litigation. This potential reciprocity is especially looming for those parties who “cast the first stone” with accusations of spoliation and sanctions motions. If their own eDiscovery house is not in order, organizations may find their home loaded with the glass shards of increased litigation costs and negative publicity.

Such was the case in the blockbuster patent dispute involving technology titans Apple and Samsung Electronics. In Apple, the court first issued an adverse inference instruction against Samsung to address spoliation charges brought by Apple. In particular, the court faulted Samsung for failing to circulate a comprehensive litigation hold instruction when it first anticipated litigation. This eventually culminated in the loss of emails from several key Samsung custodians, inviting the court’s adverse inference sanction.

However, while Apple was raising the specter of spoliation, it had failed to prepare its own eDiscovery glass house from the inevitable stones that Samsung would throw. Indeed, Samsung raised the very same issues that Apple had leveled against Samsung, i.e., that Apple had neglected to implement a timely and comprehensive litigation hold to prevent wholesale destruction of relevant email. Just like Samsung, Apple failed to distribute a hold instruction until several months after litigation was reasonably foreseeable:

As this Court has already determined, this litigation was reasonably foreseeable as of August 2010, and thus Apple’s duty to preserve, like Samsung’s, arose in August 2010. . . . Notwithstanding this duty, Apple did not issue any litigation hold notices until after filing its complaint in April 2011.

Moreover, Apple additionally failed to issue hold notices to several designers and inventors on the patents at issue until many months after the critical August date. These shortcomings, coupled with evidence suggesting that Apple employees were “encouraged to keep the size of their email accounts below certain limits,” ultimately led the court to conclude that Apple destroyed documents after its preservation duty ripened. To address Apple’s spoliation, the court issued an adverse inference identical to the instruction it levied on Samsung.[1]

While there are many lessons learned from the Apple case, perhaps none stands out more than the “glass house” rule: an organization that calls the other side’s preservation and production efforts into doubt must have its own house prepared for reciprocal allegations. Such preparations include following the golden rules of eDiscovery and integrating upstream information retention protocols into downstream eDiscovery processes. By making such preparations, organizations can reinforce their glass eDiscovery house with the structural steel of information governance, lessening the risk of sanctions and other negative consequences.



[1] The district court modified and softened the magistrate’s original instruction issued against Samsung given the minor prejudice that Apple suffered as a result of Samsung’s spoliation. The revised instruction against Samsung, along with the matching instruction against Apple, were ultimately never read to the jury given their offsetting nature.

Clean Sweep in Kleen Products Predictive Coding Battle? Not Exactly

Friday, August 24th, 2012

The tears of sadness shed by those in the eDiscovery community lamenting the end of the predictive coding debate in Kleen Products may turn to tears of joy when they realize that the debate could resurface next year. Despite early reports, the Plaintiffs in Kleen did not completely roll over on their argument that defendants should be required to use what they characterize as “Content Based Advanced Analytics” (“CBAA”). To the contrary, Plaintiffs preserved their right to meet and confer with Defendants about future document productions after October 1, 2013. Not surprisingly, future document productions could rekindle the fiery debate about the use of predictive coding technology.

The controversy surrounding Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. was sparked earlier this year when Plaintiffs asked Judge Nolan to order Defendants to redo their previous productions and all future productions using CBAA. Among other things, Plaintiffs claimed that if Defendants had used “CBAA” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. In June, I reported hearing transcripts indicated that 7th Circuit Magistrate Judge Nan Nolan was urging the parties to focus on developing a mutually agreeable keyword approach to eDiscovery instead of debating whether other search and review methodologies would yield better results. This nudging by Judge Nolan was not surprising considering at least some of the defendants had already spent considerable time and money managing the document production process using more traditional tools other than predictive coding.

In a new twist, reports from other sources surfaced recently, suggesting that the Plaintiffs in Kleen decided to completely withdraw their demands that Defendants use predictive coding during discovery. The news likely disappointed many in the electronic discovery space poised to witness a third round of expert testimony pitting more traditional eDiscovery approaches against predictive coding technology. However, any such disappointment is premature because those dreaming of an eDiscovery showdown in Kleen could still see their dreams come true next year.

On August 21, Judge Nolan did indeed sign a joint “Stipulation and Order Relating to ESI Search.” However, in the order the Plaintiffs withdrew “their demand that defendants apply CBAA to documents contained in the First Request Corpus (emphasis added).” Plaintiffs go on to stipulate that they will not “argue or contend that defendants should be required to use or apply the types of CBAA or “predictive coding” methodology… with respect to any requests for production served on any defendant prior to October 1, 2013 (emphasis added).” Importantly, the Plaintiffs preserved their right to meet and confer regarding the appropriate search methodology to be used for future collections if discovery continues past October of next year.

Considering the parties have only scratched the surface of discovery thus far, the likelihood that the predictive coding issue will resurface again is high unless settlement is reached or Defendants have a change of heart. In short, the door is still wide open for Plaintiffs to argue that Defendants should be required to use predictive coding technology to manage future productions, and rumors about the complete demise of predictive coding in the Kleen Products case have been exaggerated.

Magic 8 Ball Predictions for eDiscovery in Florida: FRCP, FOIA and the Sunshine Laws

Thursday, August 23rd, 2012

The Sunshine State is shining a new ray of light on the information governance and eDiscovery space with new civil procedure laws addressing electronically stored information (ESI). The new rules, which go into effect September 1, 2012, are six years in the making and a product of many iterations and debate amongst practitioners, neutrals and jurists. While they generally mirror the Federal Rules of Civil Procedure (FRCP) and embrace much of Sedona’s Cooperation Proclamation, there are some marked procedural differences that generally accomplish these same goals.

For example, instead of mandating a meet and confer conference (a la the FRCP), the new state rules provide for these negotiations in a case management conference pursuant to Rule 1.200-1.201. None of the Florida rules are a surprise since they wisely promote early discussions regarding potential discovery problems, understanding of information management systems, and competency on the part of lawyers and their clients to effectively address litigation hold practices and preservation – just as the FRCP do.

There are comprehensive blogs that have already covered the nuts and bolts of how the rules change the practice of law in Florida with regard to ESI, as well as a fantastic video featuring Judge Richard Nielsen who piloted these principles in his Florida court. Perhaps the most interesting legal issues facing Florida have to do with the impact of the new rules intersecting with open government and record keeping, and what the burden of the government will be on a go forward basis to produce metadata.

This is not to say the private sector won’t have to make changes as well, because anyone litigating in Florida should take eDiscovery seriously given recent cases like Coquina Investments v. Rothstein. In this case, Judge Marcia Cooke boldly sanctioned the defendant(s) and their lawyers for failing to preserve, search and produce information relevant to the case. One of the issues in the case involved format; paper documents were produced by the defendant when they should have been electronically produced with relevant metadata.

The federal government has had a brush with this nexus, although it remains unresolved. In the NDLON case, Judge Scheindlin initially ordered the government to produce select metadata, but subsequently retracted her ruling. Critics of the initial holding claim she confused the discovery requirements of the FRCP and Freedom of Information Act (FOIA). While these two have different legal standards – (FOIA) reasonable and the (FRCP) proportional – this issue is a red herring.

The differing standards are not the true issue; the ability to conduct a thorough search to retrieve relevant information and produce metadata appropriately is the crux. FOIA is in many cases a more stringent standard than that of the FRCP, and this puts even more pressure on the government to improve their technology. The simple premise documents should be produced in the manner they were created, or alternatively, with all of the characteristics necessary to the merits of a proceeding, is not technologically difficult to attain. Nor is the redaction of sensitive information due to relevance or an exemption.

Florida’s most luminary legal contribution to information governance up until this point has been the most comprehensive body of legislation in the United States addressing the right to information and access to public records (Sunshine Laws). Early on, Florida embraced the concept that information created by the government needs to be accessible to the public, and has adopted policies and technologies to address this responsibility.

Florida has historically been the most transparent of all the states and proactive about clarifying how certain communications (specifically ESI) become public records. In the near future, these laws will further force Florida into becoming the most progressive state with regard to their information management and in-house eDiscovery capabilities. More than the laws being on the books, the sheer number of lawsuits increasingly involving the Sunshine Laws and ESI will be the impetus for much of this technological innovation.

Today we are in the age of information governance, and at the dawn of mainstream predictive coding for litigation. Increasingly, organizations are archiving information and deploying in-house eDiscovery capabilities pursuing the promise of gaining control of data, limiting risk, and deriving value from their data. The fact that civil litigants are suing the government frequently under the FOIA and Sunshine Laws creates a nexus that must and will be resolved in the near future.

The most brilliant part of NDLON’s first ruling regarding metadata was that it spoke to the concept of the FRCP and FOIA being aligned. Both are requests for production, and while they have differing legal standards, it is inefficient to conduct those searches in a different/unrelated manner once an information governance infrastructure has been implemented. When they collide, one has both to contend with and the new rules will bring this issue to resolve. The tools used for a discovery request can and should be the same as those used to comply with a FOIA production – and they should be in place from the start. For a state like Florida, a case involving the Sunshine Laws will consider this question, but now under more ESI-savvy rules. Florida cannot afford to be reinventing the wheel, or scrambling to comply with requests, a proactive infrastructure needs to be in place.

Florida’s new rules will impact all areas of state and local government, as well as educational institutions that are state funded in civil litigation. Questions about format, employee self-collection, retention and litigation hold are going to get very hot in the Sunshine State because the government is more accountable there. As said by Louis Brandeis, “Sunlight is said to be the best of disinfectants; electric light the most efficient policeman.” This may be a rare case of state case law driving federal rulemaking, coupled with a need for technological advancement on the government’s part.

Addressing the Challenges of Cross-Border Data Protection and eDiscovery

Friday, August 17th, 2012

One of the more troubling eDiscovery issues that globalization has inadvertently imposed on organizations is compliance with a complex set of international data protection and privacy laws. These laws present a significant challenge to U.S. companies, which enjoy fewer domestic restraints on collecting and storing personal data of its employees and consumers.

It’s not that these laws are unfamiliar concepts to U.S. corporations. Contrary to popular belief, statutes and regulations do exist in the U.S. to help protect certain personal and financial information from unauthorized disclosure. Nevertheless, the U.S. approach to data protection is mostly patchwork and is unmatched by the comprehensive framework in other regions, particularly in Europe.

Data Protection in Europe

The data protection regime adopted by the European Union (EU) presents unique information governance challenges to even the most sophisticated organizations. Developed to address the abuses of twentieth century fascism and communism, the EU system emphasizes the importance of securing personal information from unreasonable government and corporate intrusions. To guard against such intrusions, the EU member states have enacted laws that curtail processing, collection and storage of personal data. For example, European laws generally prevent organizations from processing personal information unless it is done for a lawful purpose and is not excessive. Furthermore, personal data may not be maintained longer than is necessary and must be properly secured.

Beyond these basic data protection principles, certain countries in Europe provide additional safeguards. In Germany, for instance, state governments have implemented their own data privacy provisions that are exclusive of and, in the case of the German state of Schleswig-Holstein, more exacting than the larger EU protection framework. Furthermore, corporate data processing in Germany must satisfy company Works Councils, which represent the interests of employees and protect their privacy rights.

The Clash between Data Protection Laws and Litigation Discovery Rules

A significant area of complexity facing organizations with respect to the governance of personal information concerns the treatment of that data in European and cross-border litigation. In domestic European litigation, personal data could be subject to discovery if it supports the claims of the parties or a court orders its disclosure. That could place an organization in the tricky position of having to produce personal data that may very well be protected by privacy laws. While legal exceptions do exist for these situations, the person whose data is subject to disclosure may nonetheless seek to prevent its dissemination on privacy grounds. Furthermore, company Works Councils and Data Protection Officers may object to these disclosures.

Additional difficulty may arise when addressing international discovery requests that seek personal information. Companies whose European offices receive these requests must ensure that the country where the data will be transferred has enacted laws that meet EU data protection standards. Transfers of personal data to countries that do not meet those standards are generally forbidden, with fines and even prison time imposed for non-compliance.

Certain countries have more stringent rules regarding proposed transfers of personal information. In France, for example, international discovery requests that seek personal data must comply with the rules promulgated by the French data protection authority, La Commission Nationale de l’Informatique et des Libertès (CNIL). Those rules require that the CNIL and the data subjects be notified regarding the proposed data transfer. In addition, disclosures must be limited to relevant information, with appropriate redactions of data that could be used to identify the data subjects.

Additional complications may arise for enterprises whose European offices have been served with discovery requests from the U.S. Despite the restrictions imposed by European data protection authorities and the penalties for noncompliance, organizations are often compelled by U.S. courts to produce personal information without regard to these laws. Noncompliance could subject organizations to U.S. court sanctions or, on the other hand, fines and possibly even jail time under European data protection laws.

Using Information Governance to Solve the Data Protection Conundrum

Given the complexity of ensuring conformity with foreign privacy rules and the penalties for noncompliance, organizations should consider developing an information governance strategy to effectively address these issues. Such an approach will typically require the data management principals (legal and IT) to work together on the myriad of legal and logistical issues surrounding information retention.

Legal and IT should also develop a process for how the organization will address data preservation and production during litigation. Where applicable, Works Councils and Data Protection Officers should be involved in the process to ensure that data protection laws are properly observed and employee privacy rights are safeguarded.

An effective governance strategy should also incorporate effective, enabling technologies to meet company information management goals while observing data protection laws. Archiving software, data loss prevention functionality and eDiscovery tools are all examples of technologies that together provide the means to protect personal information processed in connection with an organization’s information governance strategy.

By following these steps, organizations will be better prepared for the challenges of addressing cross-border data protection laws and the legal traps that are inextricably intertwined with globalization.

FOIA Matters! — 2012 Information Governance Survey Results for the Government Sector

Thursday, July 12th, 2012

At this year’s EDGE Summit in April, Symantec polled attendees about a range of government-specific information governance questions. The attendees of the event were primarily comprised of members from IT, Legal, as well as Freedom of Information Act (FOIA) agents, government investigators and records managers. The main purpose of the EDGE survey was to gather attendees’ thoughts on what information governance means for their agencies, discern what actions were being taken to address Big Data challenges, and assess how far along agencies were in their information governance implementations pursuant to the recent Presidential Mandate.

As my colleague Matt Nelson’s blog recounts from the LegalTech conference earlier this year, information governance and predictive coding were among the hottest topics at the LTNY 2012 show and in the industry generally. The EDGE Summit correspondingly held sessions on those two topics, as well as delved deeper into questions that are unique to the government. For example, when asked what the top driver for implementation of an information governance plan in an agency was, three out of four respondents answered “FOIA.”

The fact that FOIA was listed as the top driver for government agencies planning to implement an information governance solution is in line with data reported by the Department of Justice (DOJ) from 2008-2011 on the number of requests received. In 2008, 605,491 FOIA requests were received. This figure grew to 644,165 in 2011. While the increase in FOIA requests is not enormous percentage-wise, what is significant is the reduction in backlogs for FOIA requests. In 2008, there was a backlog of 130,419 requests and was decreased to 83,490 by 2011. This is likely due to the implementation of newer and better technology, coupled with the fact that the current administration has made FOIA request processing a priority.

In 2009, President Obama directed agencies to adopt “a presumption in favor’” of FOIA requests for greater transparency in the government. Agencies have had pressure from the President to improve the response time to (and completeness of) FOIA requests. Washington Post reporter Ed O’Keefe wrote,

“a study by the National Security Archive at George Washington University and the Knight Foundation, found approximately 90 federal agencies are equipped to process FOIA requests, and of those 90, only slightly more than half have taken at least some steps to fulfill Obama’s goal to improve government transparency.”

Agencies are increasingly more focused on complying with FOIA and will continue to improve their IT environments with archiving, eDiscovery and other proactive records management solutions in order to increase access to data.

Not far behind FOIA requests on the list of reasons to implement an information governance plan were “lawsuits” and “internal investigations.” Fortunately, any comprehensive information governance plan will axiomatically address FOIA requests since the technology implemented to accomplish information governance inherently allows for the storage, identification, collection, review and production of data regardless of the specific purpose. The use of information governance technology will not have the same workflow or process for FOIA that an internal investigation would require, for example, but the tools required are the same.

The survey also found that the top three most important activities surrounding information governance were: email/records retention (73%), data security/privacy (73%) and data storage (72%). These concerns are being addressed modularly by agencies with technology like data classification services, archiving, and data loss prevention technologies. In-house eDiscovery tools are also important as they facilitate the redaction of personally identifiable information that must be removed in many FOIA requests.

It is clear that agencies recognize the importance of managing email/records for the purposes of FOIA and this is an area of concern in light of not only the data explosion, but because 53% of respondents reported they are responsible for classifying their own data. Respondents have connected the concept of information governance with records management and the ability to execute more effectively on FOIA requests. Manual classification is rapidly becoming obsolete as data volumes grow, and is being replaced by automated solutions in successfully deployed information governance plans.

Perhaps the most interesting piece of data from the survey was the disclosures about what was preventing governmental agencies from implementing information governance plans. The top inhibitors for the government were “budget,” “internal consensus” and “lack of internal skill sets.” Contrasted with the LegalTech Survey findings from 2012 on information governance, with respondents predominantly from the private sector, the government’s concerns and implementation timelines are slightly different. In the EDGE survey, only 16% of the government respondents reported that they have implemented an information governance solution, contrasted with the 19% of the LegalTech audience. This disparity is partly because the government lacks the budget and the proper internal committee of stakeholders to sponsor and deploy a plan, but the relatively lows numbers in both sectors indicate the nascent state of information governance.

In order for a successful information governance plan to be deployed, “it takes a village,” to quote Secretary Clinton. Without prioritizing coordination between IT, legal, records managers, security, and the other necessary departments on data management, merely having the budget only purchases the technology and does not ensure true governance. In this year’s survey, 95% of EDGE respondents were actively discussing information governance solutions. Over the next two years the percentage of agencies that will submit a solution is expected to triple from 16%-52%. With the directive on records management due this month by the National Archives Records Administration (NARA), the government agencies will have clear guidance on what the best practices are for records management, and this will aid the adoption of automated archiving and records classification workflows.

The future is bright with the initiative by the President and NARA’s anticipated directive to examine the state of technology in the government. The EDGE survey results support the forecast, provided budget can be obtained, that agencies will be in an improved state of information governance within the next two years. This will be an improvement for FOIA request compliance, efficient litigation with the government and increase their ability to effectively conduct internal investigations.

Many would have projected that the results of the survey question on what drives information governance in the government would be litigation, internal investigations, and FOIA requests respectively. And yet, FOIA has recently taken on a more important role given the Obama administration’s focus on transparency and the increased number of requests by citizens. While any one of the drivers could have facilitated updates in process and technology the government clearly needs, FOIA has positive momentum behind it and seems to be the impetus primarily driving information governance. Fortunately, archiving and eDiscovery technology, only two parts of information governance continuum, can help with all three of the aforementioned drivers with different workflows.

Later this month we will examine NARA’s directive and what the impact will be on the government’s technology environment – stay tuned.

Gartner’s 2012 Magic Quadrant for E-Discovery Software Looks to Information Governance as the Future

Monday, June 18th, 2012

Gartner recently released its 2012 Magic Quadrant for E-Discovery Software, which is its annual report analyzing the state of the electronic discovery industry. Many vendors in the Magic Quadrant (MQ) may initially focus on their position and the juxtaposition of their competitive neighbors along the Visionary – Execution axis. While a very useful exercise, there are also a number of additional nuggets in the MQ, particularly regarding Gartner’s overview of the market, anticipated rates of consolidation and future market direction.

Context

For those of us who’ve been around the eDiscovery industry since its infancy, it’s gratifying to see the electronic discovery industry mature.  As Gartner concludes, the promise of this industry isn’t off in the future, it’s now:

“E-discovery is now a well-established fact in the legal and judicial worlds. … The growth of the e-discovery market is thus inevitable, as is the acceptance of technological assistance, even in professions with long-standing paper traditions.”

The past wasn’t always so rosy, particularly when the market was dominated by hundreds of service providers that seemed to hold on by maintaining a few key relationships, combined with relatively high margins.

“The market was once characterized by many small providers and some large ones, mostly employed indirectly by law firms, rather than directly by corporations. …  Purchasing decisions frequently reflected long-standing trusted relationships, which meant that even a small book of business was profitable to providers and the effects of customary market forces were muted. Providers were able to subsist on one or two large law firms or corporate clients.”

Consolidation

The Magic Quadrant correctly notes that these “salad days” just weren’t feasible long term. Gartner sees the pace of consolidation heating up even further, with some players striking it rich and some going home empty handed.

“We expect that 2012 and 2013 will see many of these providers cease to exist as independent entities for one reason or another — by means of merger or acquisition, or business failure. This is a market in which differentiation is difficult and technology competence, business model rejuvenation or size are now required for survival. … The e-discovery software market is in a phase of high growth, increasing maturity and inevitable consolidation.”

Navigating these treacherous waters isn’t easy for eDiscovery providers, nor is it simple for customers to make purchasing decisions if they’re correctly concerned that the solution they buy today won’t be around tomorrow.  Yet, despite the prognostication of an inevitable shakeout (Gartner forecasts that the market will shrink 25% in the raw number of firms claiming eDiscovery products/services) they are still very bullish about the sector.

“Gartner estimates that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010. The five-year CAGR to 2015 is approximately 16%.”

This certainly means there’s a window of opportunity for certain players – particularly those who help larger players fill out their EDRM suite of offerings, since the best of breed era is quickly going by the wayside.  Gartner notes that end-to-end functionality is now table stakes in the eDiscovery space.

“We have seen a large upsurge in user requests for full-spectrum EDRM functionality. Whether that functionality will be used initially, or at all, remains an open question. Corporate buyers do seem minded to future-proof their investments in this way, by anticipating what they may wish to do with the software and the vendor in the future.”

Information Governance

Not surprisingly, it’s this “full-spectrum” functionality that most closely aligns with marrying the reactive, right side of the EDRM with the proactive, left side.  In concert, this yin and yang is referred to as information governance, and it’s this notion that’s increasingly driving buying behaviors.

“It is clear from our inquiry service that the desire to bring e-discovery under control by bringing data under control with retention management is a strategy that both legal and IT departments pursue in order to control cost and reduce risks. Sometimes the archiving solution precedes the e-discovery solution, and sometimes it follows it, but Gartner clients that feel the most comfortable with their e-discovery processes and most in control of their data are those that have put archiving systems in place …”

As Gartner looks out five years, the analyst firm anticipates more progress on the information governance front, because the “entire e-discovery industry is founded on a pile of largely redundant, outdated and trivial data.”  At some point this digital landfill is going to burst and organizations are finally realizing that if they don’t act now, it may be too late.

“During the past 10 to 15 years, corporations and individuals have allowed this data to accumulate for the simple reason that it was easy — if not necessarily inexpensive — to do so. … E-discovery has proved to be a huge motivation for companies to rethink their information management policies. The problem of determining what is relevant from a mass of information will not be solved quickly, but with a clear business driver (e-discovery) and an undeniable return on investment (deleting data that is no longer required for legal or business purposes can save millions of dollars in storage costs) there is hope for the future.”

 

The Gartner Magic Quadrant for E-Discovery Software is insightful for a number of reasons, not the least of which is how it portrays the developing maturity of the electronic discovery space. In just a few short years, the niche has sprouted wings, raced to $1B and is seeing massive consolidation. As we enter the next phase of maturation, we’ll likely see the sector morph into a larger, information governance play, given customers’ “full-spectrum” functionality requirements and the presence of larger, mainstream software companies.  Next on the horizon is the subsuming of eDiscovery into both the bigger information governance umbrella, as well as other larger adjacent plays like “enterprise information archiving, enterprise content management, enterprise search and content analytics.” The rapid maturation of the eDiscovery industry will inevitably result in growing pains for vendors and practitioners alike, but in the end we’ll all benefit.

 

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Kleen Products Predictive Coding Update – Judge Nolan: “I am a believer of principle 6 of Sedona”

Tuesday, June 5th, 2012

Recent transcripts reveal that 7th Circuit Magistrate Judge Nan Nolan has urged the parties in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. to focus on developing a mutually agreeable keyword search strategy for eDiscovery instead of debating whether other search and review methodologies would yield better results. This is big news for litigators and others in the electronic discovery space because many perceived Kleen Products as potentially putting keyword search technology on trial, compared to newer technology like predictive coding. Considering keyword search technology is still widely used in eDiscovery, a ruling by Judge Nolan requiring defendants to redo part of their production using technology other than keyword searches would sound alarm bells for many litigators.

The controversy surrounding Kleen Products relates both to Plaintiffs’ position, as well as the status of discovery in the case. Plaintiffs initially asked Judge Nolan to order Defendants to redo their previous productions and all future productions using alternative technology.  The request was surprising to many observers because some Defendants had already spent thousands of hours reviewing and producing in excess of one million documents. That number has since surpassed three million documents.  Among other things, Plaintiffs claim that if Defendants had used “Content Based Advanced Analytics” tools (a term they did not define) such as predictive coding technology, then their production would have been more thorough. Notably, Plaintiffs do not appear to point to any instances of specific documents missing from Defendants’ productions.

In response, Defendants countered that their use of keyword search technology and their eDiscovery methodology in general was extremely rigorous and thorough. More specifically, they highlight their use of advanced culling and analysis tools (such as domain filtering and email threading) in addition to keyword search tools.  Plaintiffs also claim they cooperated with Defendants by allowing them to participate in the selection of keywords used to search for relevant documents.  Perhaps going above and beyond the eDiscovery norm, the Defendants even instituted a detailed document sampling approach designed to measure the quality of their document productions.

Following two full days of expert witness testimony regarding the adequacy of Plaintiffs’ initial productions, Judge Nolan finally asked the parties to try and reach compromise on the “Boolean” keyword approach.  She apparently reasoned that having the parties work out a mutually agreeable approach based on what Defendants had already implemented was preferable to scheduling yet another full day of expert testimony — even though additional expert testimony is still an option.

In a nod to the Sedona Principles, she further explained her rationale on March 28, 2012, at the conclusion of the second day of testimony:

“the defendants had done a lot of work, the defendant under Sedona 6 has the right to pick the [eDiscovery] method. Now, we all know, every court in the country has used Boolean search, I mean, this is not like some freak thing that they [Defendants] picked out…”

Judge Nolan’s reliance on the Sedona Best Practices Recommendations & Principles for Addressing Electronic Document Production reveals how she would likely rule if Plaintiffs renew their position that Defendants should have used predictive coding or some other kind of technology in lieu of keyword searches. Sedona Principle 6 states that:

“[r]esponding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

In other words, Judge Nolan confirmed that in her court, opposing parties typically may not dictate what technology solutions their opponents must use without some indication that the technology or process used failed to yield accurate results. Judge Nolan also observed that quality and accuracy are key guideposts regardless of the technology utilized during the eDiscovery process:

“what I was learning from the two days, and this is something no other court in the country has really done too, is how important it is to have quality search. I mean, if we want to use the term “quality” or “accurate,” but we all want this…– how do you verify the work that you have done already, is the way I put it.”

Although Plaintiffs have reserved their right to reintroduce their technology arguments, recent transcripts suggest that Defendants will not be required to use different technology. Plaintiffs continue to meet and confer with individual Defendants to agree on keyword searches, as well as the types of data sources that must be included in the collection. The parties and Judge also appear to agree that they would like to continue making progress with 30(b)(6) depositions and other eDiscovery issues before Judge Nolan retires in a few months, rather than begin a third day of expert hearings regarding technology related issues. This appears to be good news for the Judge and the parties since the eDiscovery issues now seem to be headed in the right direction as a result of mutual cooperation between the parties and some nudging by Judge Nolan.

There is also good news for outside observers in that Judge Nolan has provided some sage guidance to help future litigants before she steps down from the bench. For example, it is clear that Judge Nolan and other judges continue to emphasize the importance of cooperation in today’s complex new world of technology. Parties should be prepared to cooperate and be more transparent during discovery given the judiciary’s increased reliance on the Sedona Cooperation Proclamation. Second, Kleen Products illustrates that keyword search is not dead. Instead, keyword search should be viewed as one of many tools in the Litigator’s Toolbelt™ that can be used with other tools such as email threading, advanced filtering technology, and even predictive coding tools.  Finally, litigators should take note that regardless of the tools they select, they must be prepared to defend their process and use of those tools or risk the scrutiny of judges and opposing parties.

Gartner’s “2012 Magic Quadrant for E-Discovery Software” Provides a Useful Roadmap for Legal Technologists

Tuesday, May 29th, 2012

Gartner has just released its 2012 Magic Quadrant for E-Discovery Software, which is an annual report that analyzes the state of the electronic discovery industry and provides a detailed vendor-by-vendor evaluation. For many, particularly those in IT circles, Gartner is an unwavering north star used to divine software market leaders, in topics ranging from business intelligence platforms to wireless lan infrastructures. When IT professionals are on the cusp of procuring complex software, they look to analysts like Gartner for quantifiable and objective recommendations – as a way to inform and buttress their own internal decision making processes.

But for some in the legal technology field (particularly attorneys), looking to Gartner for software analysis can seem a bit foreign. Legal practitioners are often more comfortable with the “good ole days” when the only navigation aid in the eDiscovery world was provided by the dynamic duo of George Socha and Tom Gelbmanm, who (beyond creating the EDRM) were pioneers of the first eDiscovery rankings survey. Albeit somewhat short lived, their Annual Electronic Discovery[i] Survey ranked the hundreds of eDiscovery providers and bucketed the top tier players in both software and litigation support categories. The scope of their mission was grand, and they were perhaps ultimately undone by the breadth of their task (stopping the Survey in 2010), particularly as the eDiscovery landscape continued to mature, fragment and evolve.

Gartner, which has perfected the analysis of emerging software markets, appears to have taken on this challenge with an admittedly more narrow (and likely more achievable) focus. Gartner published its first Magic Quadrant (MQ) for the eDiscovery industry last year, and in the 2012 Magic Quadrant for E-Discovery Software report they’ve evaluated the top 21 electronic discovery software vendors. As with all Gartner MQs, their methodology is rigorous; in order to be included, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon criteria for completeness of vision and ability to execute.

By eliminating the legion of service providers and law firms, Gartner has made their mission both more achievable and perhaps (to some) less relevant. When talking to certain law firms and litigation support providers, some seem to treat the Gartner initiative (and subsequent Magic Quadrant) like a map from a land they never plan to visit. But, even if they’re not directly procuring eDiscovery software, the Gartner MQ should still be seen by legal technologists as an invaluable tool to navigate the perils of the often confusing and shifting eDiscovery landscape – particularly with the rash of recent M&A activity.

Beyond the quadrant positions[ii], comprehensive analysis and secular market trends, one of the key underpinnings of the Magic Quadrant is that the ultimate position of a given provider is in many ways an aggregate measurement of overall customer satisfaction. Similar in ways to the net promoter concept (which is a tool to gauge the loyalty of a firm’s customer relationships simply by asking how likely that customer is to recommend a product/service to a colleague), the Gartner MQ can be looked at as the sum total of all customer experiences.[iii] As such, this usage/satisfaction feedback is relevant even for parties that aren’t purchasing or deploying electronic discovery software per se. Outside counsel, partners, litigation support vendors and other interested parties may all end up interacting with a deployed eDiscovery solution (particularly when such solutions have expanded their reach as end-to-end information governance platforms) and they should want their chosen solution to used happily and seamlessly in a given enterprise. There’s no shortage of stories about unhappy outside counsel (for example) that complain about being hamstrung by a slow, first generation eDiscovery solution that ultimately makes their job harder (and riskier).

Next, the Gartner MQ also is a good short-handed way to understand more nuanced topics like time to value and total cost of ownership. While of course related to overall satisfaction, the Magic Quadrant does indirectly address the query about whether the software does what it says it will (delivering on the promise) in the time frame that is claimed (delivering the promise in a reasonable time frame) since these elements are typically subsumed in the satisfaction metric. This kind of detail is disclosed in the numerous interviews that Gartner conducts to go behind the scenes, querying usage and overall satisfaction.

While no navigation aid ensures that a traveler won’t get lost, the Gartner Magic Quadrant for E-Discovery Software is a useful map of the electronic discovery software world. And, particularly looking at year-over-year trends, the MQ provides a useful way for legal practitioners (beyond the typical IT users) to get a sense of the electronic discovery market landscape as it evolves and matures. After all, staying on top of the eDiscovery industry has a range of benefits beyond just software procurement.

Please register here to access the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



[i] Note, in the good ole days folks still used two words to describe eDiscovery.

[ii] Gartner has a proprietary matrix that it uses to place the entities into four quadrants: Leaders, Challengers, Visionaries and Niche Players.

[iii] Under the Ability to Execute axis Gartner weighs a number of factors including “Customer Experience: Relationships, products and services or programs that enable clients to succeed with the products evaluated. Specifically, this criterion includes implementation experience, and the ways customers receive technical support or account support. It can also include ancillary tools, the existence and quality of customer support programs, availability of user groups, service-level agreements and so on.”