Archive for the ‘defensible e-discovery’ Category

Computer-Assisted Review “Acceptable in Appropriate Cases,” says Judge Peck in new Da Silva Moore eDiscovery Ruling

Saturday, February 25th, 2012

The Honorable Andrew J. Peck, United States Magistrate Judge for the Southern District of New York, issued an opinion and order (order) on February 24th in Da Silva Moore v. Publicis Groupe, stating that computer-assisted review in eDiscovery is “acceptable in appropriate cases.”  The order was issued over plaintiffs’ objection that the predictive coding protocol submitted to the court will not provide an appropriate level of transparency into the predictive coding process.  This and other objections will be reviewed by the district court for error, leaving open the possibility that the order could be modified or overturned.  Regardless of whether or not that happens, Judge Peck’s order makes it clear that the future of predictive coding technology is bright, the role of other eDiscovery technology tools should not be overlooked, and the methodology for using any technology tool is just as important as the tool used.

Plaintiffs’ Objections and Judge Peck’s Preemptive Strikes

In anticipation of the district court’s review, the order preemptively rejects plaintiffs’ assertion that defendant MSL’s protocol is not sufficiently transparent.  In so doing, Judge Peck reasons that plaintiffs will be able to see how MSL codes emails.  If they disagree with MSL’s decisions, plaintiffs will be able to seek judicial intervention. (Id. at 16.)  Plaintiffs appear to argue that although this and other steps in the predictive coding protocol are transparent, the overall protocol (viewed in its entirety) is not transparent or fair.  The crux of plaintiffs’ argument is that just because MSL provides a few peeks behind the curtain during this complex process, many important decisions impacting the accuracy and quality of the document production are being made unilaterally by MSL.  Plaintiffs essentially conclude that such unilateral decision-making does not allow them to properly vet MSL’s methodology, which leads to a fox guarding the hen house problem.

Similarly, Judge Peck dismissed plaintiffs’ argument that expert testimony should have been considered during the status conference pursuant to Rule 702 and the Daubert standard.  In one of many references to his article, “Search, Forward: will manual document review and keyword searches be replaced by computer-assisted coding?” Judge Peck explains:

My article further explained my belief that Daubert would not apply to the results of using predictive coding, but that in any challenge to its use, this Judge would be interested in both the process used and the results.” (Id. at 4.)

The court further hints that results may play a bigger role than science:

“[I]f the use of predictive coding is challenged in a case before me, I will want to know what was done and why that produced defensible results. I may be less interested in the science behind the “black box” of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision.” (Id.)

Judge Peck concludes that Rule 702 and Daubert are not applicable to how documents are searched for and found in discovery.  Instead, both deal with the” trial court’s role as gatekeeper to exclude unreliable testimony from being submitted to the jury at trial.” (Id. at 15.)  Despite Judge Peck’s comments, the waters are still murky on this point as evidenced by differing views expressed by Judges Grimm and Facciola in O’Keefe, Equity Analytics, and Victor Stanley.  For example, in Equity Analytics, Judge Facciola addresses the need for expert testimony to support keyword search technology:

[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) and requires expert testimony that meets the requirements of Rule 702 of the Federal Rules of Evidence.” (Id. at 333.)

Given the uncertainty regarding the applicability of Rule 702 and Daubert, it will be interesting to see if and how the district court addresses the issue of expert testimony.

What This Order Means and Does not Mean for the Future of Predictive Coding

The order states that “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.” (Id. at 2.)  Recognizing that there have been some erroneous reports, Judge Peck went to great lengths to clarify his order and to “correct the many blogs about this case.” (Id. at 2, fn. 1.)  Some important excerpts are listed below:

The Court did not order the use of predictive coding

“[T]he Court did not order the parties to use predictive coding.  The parties had agreed to defendants’ use of it, but had disputes over the scope and implementation, which the Court ruled on, thus accepting the use of computer-assisted review in this lawsuit.” (Id.)

Computer-assisted review is not required in all cases

“That does not mean computer-assisted review must be used in all cases, or that the exact ESI protocol approved here will be appropriate in all future cases that utilize computer-assisted review. (Id. at 25.)

The opinion should not be considered an endorsement of any particular vendors or tools

“Nor does this Opinion endorse any vendor…, nor any particular computer-assisted review tool.” (Id.)

Predictive coding technology can still be expensive

MSL wanted to only review and produce the top 40,000 documents, which it estimated would cost $200,000 (at $5 per document). (1/4/12 Conf. Tr. at 47-48, 51.)

Process and methodology are as important as the technology utilized

“As with keywords or any other technological solution to eDiscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b )(2)(C) proportionality.” (Id.)

Conclusion

The final excerpt drives home the points made in a recent Forbes article involving this and another predictive coding case (Kleen Products).  The first point is that there are a range of technology-assisted review (TAR) tools in the litigator’s tool belt that will often be used together in eDiscovery, and predictive coding technology is one of those tools.  Secondly, none of these tools will provide accurate results unless they are relatively easy to use and used properly.  In other words, the carpenter is just as important as the hammer.  Applying these guideposts and demanding cooperation and transparency between the parties will help the bench usher in a new era of eDiscovery technology that is fair and just for everyone.

Plaintiffs Object to Predictive Coding Order, Argue Lack of Transparency in eDiscovery Process

Friday, February 24th, 2012

The other shoe dropped in the Da Silva Moore v. Publicis Groupe case this week as the plaintiffs filed their objections to a preliminary eDiscovery order addressing predictive coding technology. In challenging the order issued by the Honorable Andrew J. Peck, the plaintiffs argue that the protocol will not provide an appropriate level of transparency into the predictive coding process. In particular, the plaintiffs assert that the ordered process does not establish “the necessary standards” and “quality assurance” levels required to satisfy Federal Rule of Civil Procedure 26(b)(1) and Federal Rule  of Evidence 702.

The Rule 26(b) Relevance Standard

With respect to the relevance standard under Rule 26, plaintiffs maintain that there are no objective criteria to establish that defendant’s predictive coding technology will reliably “capture a sufficient number of relevant documents from the total universe of documents in existence.” Unless the technology’s “search methodologies” are “carefully crafted and tested for quality assurance,” there is risk that the defined protocol could “exclude a large number of responsive email” from the defendant’s production. This, plaintiffs assert, is not acceptable in an employment discrimination matter where liberal discovery is typically the order of the day.

Reliability under Rule 702

The plaintiffs also contend that the court abdicated its gatekeeper role under Rule 702 and the U.S. Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals by not soliciting expert testimony to assess the reliability of the defendant’s predictive coding technology. Such testimony is particularly necessary in this instance, plaintiffs argue, where the technology at issue is new and untested by the judiciary. To support their position, the plaintiffs filed a declaration from their expert witness that challenges its reliability. Relying on that declaration, the plaintiffs complain that the process lacks “explicit and defined standards.” According to the plaintiffs, such standards would typically include “calculations . . . to determine whether the system is accurate in identifying responsive documents.” They would also include “the standard of acceptance that they are trying to achieve,” i.e., whether the defendant’s “method actually works.”  Plaintiffs conclude that without such “quality assurance measurements in place to determine whether the methodology is reliable,” the current predictive coding process is “fundamentally flawed” and should be rejected.

Wait and See

Now that the plaintiffs have filed their objections, the eDiscovery world must now wait and see what will happen next. The defendant will certainly respond in kind, vigorously defending the ordered process with declarations from its own experts. Whether the plaintiffs or the defendant will carry the day depends on how the district court views these issues, particularly the issue of transparency. Simply put, the question is whether the process at issue is sufficiently transparent to satisfy Rule 26 and Rule 702? That is the proverbial $64,000 question as we wait and see how this issue plays out in the courts over the coming weeks and months.

Judge Peck Issues Order Addressing “Joint Predictive Coding Protocol” in Da Silva Moore eDiscovery Case

Thursday, February 23rd, 2012

Litigation attorneys were abuzz last week when a few breaking news stories erroneously reported that The Honorable Andrew J. Peck, United States Magistrate Judge for the Southern District of New York, ordered the parties in a gender discrimination case to use predictive coding technology during discovery.  Despite early reports, the parties in the case (Da Silva Moore v. Publicis Group, et. al.) actually agreed to use predictive coding technology during discovery – apparently of their own accord.  The case is still significant because predictive coding technology in eDiscovery is relatively new to the legal field, and many have been reluctant to embrace a new technological approach to document review due to, among other things, a lack of judicial guidance.

Unfortunately, despite this atmosphere of cooperation, the discussion stalled when the parties realized they were miles apart in terms of defining a mutually agreeable predictive coding protocol.  A February status conference transcript reveals significant confusion and complexity related to issues such as random sampling, quality control testing, and the overall process integrity.  In response, Judge Peck ordered the parties to submit a Joint Protocol for eDiscovery to address eDiscovery generally and the use of predictive coding technology specifically.

The parties submitted their proposed protocol on February 22, 2012 and Judge Peck quickly reduced that submission to a stipulation and order.  The stipulation and order certainly provides more clarity and insight into the process than the status conference transcript.  However, reading the stipulation and order leaves little doubt that the devil is in the details – and there are a lot of details.  Equally clear is the fact that the parties are still in disagreement and the plaintiffs do not support the “joint” protocol laid out in the stipulation and order.  Plaintiffs actually go so far as to incorporate a paragraph into the stipulation and order stating that they “object to this ESI Protocol in its entirety” and they “reserve the right to object to its use in the case.”

These problems underscore some of the points made in a Forbes article published earlier this week titled,Federal Judges Consider Important Issues That Could Shape the Future of Predictive Coding Technology.”  The Forbes article relies in part on a recent predictive coding survey to make the point that, while predictive coding technology has tremendous potential, the solutions need to become more transparent and the workflows must be simplified before they go mainstream.

Survey Says… Information Governance and Predictive Coding Adoption Slow, But Likely to Gain Steam as Technology Improves

Wednesday, February 15th, 2012

The biggest legal technology event of the year, otherwise known as LegalTech New York, always seems to have a few common rallying cries and this year was no different.  In addition to cloud computing and social media, predictive coding and information governance were hot topics of discussion that dominated banter among vendors, speakers, and customers.  Symantec conducted a survey on the exhibit show floor to find out what attendees really thought about these two burgeoning areas and to explore what the future might hold.

Information Governance is critical, understood, and necessary – but it is not yet being adequately addressed.

Although 84% of respondents are familiar with the term information governance and 73% believe that an integrated information governance strategy is critical to reducing information risk and cost, only 19% have implemented an information governance solution.  These results beg the question, if information governance is critical, then why aren’t more organizations adopting information governance practices?

Perhaps the answer lies in the cross-functional nature of information governance and confusion about who is responsible for the organization’s information governance strategy.  For example, the survey also revealed that information governance is a concept that incorporates multiple functions across the organization, including email/records retention, data storage, data security and privacy, compliance, and eDiscovery.  Given the broad impact of information governance across the organization, it is no surprise  respondents also indicated that multiple departments within the organization – including Legal, IT, Compliance, and Records Management – have an ownership stake.

These results tend to suggest at least two things.  First, information governance is a concept that touches multiple parts of the organization.  Defining and implementing appropriate information governance policies across the organization should include an integrated strategy that involves key stakeholders within the organization.  Second, recognition that information governance is a common goal across the entire organization highlights the fact that technology must evolve to help address information governance challenges.

The days of relying too heavily on disconnected point solutions to address eDiscovery, storage, data security, and record retention concerns are limited as organizations continue to mandate internal cost cutting and data security measures.  Decreasing the number of point solutions an organization supports and improving integration between the remaining solutions is a key component of a good information governance strategy because it has the effect of driving down technology and labor costs.   Similarly, an integrated solution strategy helps streamline the backup, retrieval, and overall management of critical data, which simultaneously increases worker productivity and reduces organizational risk in areas such as eDiscovery and data loss prevention.

The trail that leads from point solutions to an integrated solution strategy is already being blazed in the eDiscovery space and this trend serves as a good information governance roadmap.  More and more enterprises faced with investigations and litigation avoid the cost and time of deploying point solutions to address legal hold, data collection, data processing, and document review in favor of a single, integrated, enterprise eDiscovery platform.  The resulting reduction in cost and risk is significant and is fueling support for even broader information governance initiatives in other areas.  These broader initiatives will still include integrated eDiscovery solutions, but the initiatives will continue to expand the integrated solution approach into other areas such as storage management, record retention, and data security technologies to name a few.

Despite mainstream familiarity, predictive coding technology has not yet seen mainstream adoption but the future looks promising.

Much like the term information governance, most respondents were familiar with predictive coding technology for electronic discovery, but the survey results indicated that adoption of the technology to date has been weak.  Specifically, the survey revealed that while 97% of respondents are familiar with the term predictive coding, only 12% have adopted predictive coding technology.  Another 19% are “currently adopting” or plan to adopt predictive coding technology, but the timeline for adoption is unclear.

When asked what challenges “held back” respondents from adopting predictive coding technology, most cited accuracy, cost, and defensibility as their primary concerns.  Concerns about “privilege/confidentiality” and difficulty understanding the technology were also cited as reasons impeding adoption.  Significantly, 70% of respondents believe that predictive coding technology would “go mainstream” if it was easier to use, more transparent, and less expensive. These findings are consistent with the observations articulated in my recent blog (2012:  Year of the Dragon and Predictive Coding – Will the eDiscovery Landscape Be Forever Changed?)

The survey results combined with the potential cost savings associated with predictive coding technology suggest that the movement toward predictive coding technology is gaining steam.  Lawyers are typically reluctant to embrace new technology that is not intuitive because it is difficult to defend a process that is difficult to understand.  The complexity and confusion surrounding today’s predictive coding technology was highlighted recently in Da Silva Moore v. Publicis Group, et. al. during a recent status conference.  The case is venued in Southern District of New York Federal Court before Judge Andrew Peck and serves as further evidence that predictive coding technology is gaining steam.  Expect future proceedings in the Da Silva Moore case to further validate these survey results by revealing both the promise and complexity of current predictive coding technologies.  Similarly, expect next generation predictive coding technology to address current complexities by becoming easier to use, more transparent, and less expensive.

Breaking News: Federal Circuit Denies Google’s eDiscovery Mandamus Petition

Wednesday, February 8th, 2012

The U.S. Court of Appeals for the Federal Circuit dealt Google a devastating blow Monday in connection with Oracle America’s patent and copyright infringement suit against Google involving features of Java and Android. The Federal Circuit affirmed the district court’s order that a key email was not entitled to protection under the attorney-client privilege.

Google had argued that the email was privileged under Upjohn Co. v. United States, asserting that the message reflected discussions about litigation strategy between a company engineer and in-house counsel. While acknowledging that Upjohn would protect such discussions, the court rejected that characterization of the email.  Instead, the court held that the email reflected a tactical discussion about “negotiation strategy” with Google management, not an “infringement or invalidity analysis” with Google counsel.

Getting beyond the core privilege issues, Google might have avoided this dispute had it withheld the eight earlier drafts of the email that it produced to Oracle. As we discussed in our previous post, organizations conducting privilege reviews should consider using robust, next generation eDiscovery technology such as email analytical software, that could have isolated the drafts and potentially removed them from production. Other technological capabilities, such as Near Duplicate Identification, could also have helped identify draft materials and marry them up with finals marked as privileged. As this case shows, in the fast moving era of eDiscovery, having the right technology is essential for maintaining a strategic advantage in litigation.

Breaking News: Pippins Court Affirms Need for Cooperation and Proportionality in eDiscovery

Tuesday, February 7th, 2012

The long awaited order regarding the preservation of thousands of computer hard drives in Pippins v. KPMG was finally issued last week. In a sharply worded decision, the Pippins court overruled KPMG’s objections to the magistrate’s preservation order and denied its motion for protective order. The firm must now preserve the hard drives of certain former and departing employees unless it can reach an agreement with the plaintiffs on a methodology for sampling data from a select number of those hard drives.

Though easy to get caught up in the opinion’s rhetoric (“[i]t smacks of chutzpah (no definition required) to argue that the Magistrate failed to balance the costs and benefits of preservation . . .”), the Pippins case confirms the importance of both cooperation and proportionality in eDiscovery. With respect to cooperation, the court emphasized that parties should take reasonable positions in discovery so as to reach mutually agreeable results. The order also stressed the importance of communicating with the court to clarify discovery obligations.  In that regard, the court faulted the parties and the magistrate for not seeking the court’s clarification with respect to its prior order staying discovery. The court explained that the discovery stay – which KPMG had understood to prevent any sampling of the hard drives – could have been partially lifted to allow for sampling. And this, in turn, could have obviated the costs and delays associated with the motion practice on this matter.

Regarding proportionality, the court confirmed the importance of this doctrine in determining the scope of preservation. Indeed, the court declared that proportionality is typically “determinative” of a motion for protective order. Nevertheless, the court could not engage in a proportionality analysis – weighing the benefits of preserving the hard drives against its burdens – as the defendant had not yet produced any evidence from the hard drives to evaluate the nature of the evidence. Only after the evidence from a sampling of hard drives had been produced and evaluated could such a determination be made.

The Pippins case demonstrates that courts have raised their expectations for how litigants will engage in eDiscovery. Staking out unreasonable positions in the name of zealous advocacy stands in stark contrast to the clear trend that discovery should comply with the cost cutting mandate of Federal Rule 1. Cooperation and proportionality are two of the principal touchstones for effectuating that mandate.

The Top Ten “What NOT to Do” List for LegalTech New York 2012

Thursday, January 26th, 2012

As we approach LegalTech New York next week, oft referred to as the Super Bowl of legal technology events, there are any number of helpful blogs and articles telling new attendees what to expect, where to go, what to say, what to do. Undoubtedly, there’s some utility to this approach, but since we’ll be in New York, I think it’s appropriate to take a more skeptical approach and proffer a list of what *NOT* to do at LTNY.

  1. DON’T get caught up in Buzzword Bingo. There are already dozens of sources attempting to prognosticate what the most popular buzzwords will be at this year’s show.  Leading candidates include “predictive coding,” “technology assisted review,” “information governance,” “big data” and even the pedestrian sounding “sampling.” And, while these terms will undoubtedly be on booths and broadcast repeatedly from the Hilton elevator, it doesn’t mean an attendee should merely parrot these without a deeper dive.  Here, the key is go behind the green curtain to see what vendors, panelists and tweet-ers actually mean by these buzzwords, since it’s often surprising to see how the devil really is in the details.
  2. DON’T get a coffee at the Hilton Starbucks. Yes, we all love our morning coffee, but there’s no need to wait in the Justin Bieber-esque line queue at the in-hotel Starbucks. There are approximately 49 locations in a ½ mile radius, including one right across the street. There’s also the vendor giving out free coffee on the second floor, so save yourself 30 minutes of needless line waiting.
  3. DON’T ride the Hilton elevator. For those staying or taking meetings at the Hilton, the elevator lines can be excessively long.  Once you finally get on, you’ll wish they’d been even longer as you then find yourself subjected to the brainwashing of vendor announcements while you make multiple stops on your way to your desired floor. Either take the stairs or, if that’s not possible, try to minimize the trips to keep your sanity. Or, plan B – bring your iPod.
  4. DON’T talk to booth models. It’s tempting to gravitate to the most attractive person at a given vendor’s booth, but they’re often hired professionals designed to get you in for the all-important “badge scan.” Instead, focus on  the person who looks like they’ve been in the same company-branded oxford for 48 hours, because they probably have. While perhaps less aesthetically pleasing, they’ll certainly know more about the product and that’s why you’re there after all, isn’t it?
  5. DON’T pass out your resume on the show floor. While certainly a great networking opportunity, LTNY isn’t the place to blatantly tout your professional wares, at least if you want to keep your nascent job search on the down low. And, if you want to have more private meetings, you’ll need to do better than “hiding out” at the Warwick across the street. For more clandestine purposes, think about the Bronx.
  6. DON’T take tchotchkes without hearing the spiel. There are certain tchotchke hounds out there who roam around LTNY collecting “gifts” for the kids back at home. While I won’t frown on this behavior per se, it’s only courteous to actually listen to the pitch (as a quid pro quo) before you ask for the swag. Anything less is uncivilized.
  7. DON’T get over-served at the B-Discovery Party. After a long day on the show floor you’re probably ready to let loose with some of the eDiscovery practitioners you haven’t seen in a year.  But, in this era of flip cams and instant tweeting, letting your hair down too much can be career limiting. If you haven’t done Jägermeister shots since college, LTNY probably isn’t a good time to resume that dubious practice.
  8. DON’T forget to take your badge off (please!). Yes, it’s cool to let everyone know you’re attending the premier legal technology event of the year, but once you leave the show floor random New Yorkers will heckle you for sporting your badge after hours – particularly the baristas at Starbucks. Plus, if you’ve broken any of the other admonitions above, at least you’ll be more anonymous.
  9. DON’T forget to bring a heavy coat, mittens and scarf. Last year there was the infamous ice storm that stranded folks for days (me included). Even if the weather isn’t that severe this year, anyone from warmer climates will need to bundle up, particularly because it’s easy to unintentionally get caught outside for extended amounts of time – waiting for a cab in the Hilton queue, eating at Symantec’s free food cart, walking to a meeting at a “nearby” hotel that’s “just a block or so away.” Keep in mind those cross town blocks are longer than they appear on a map.
  10. DON’T forget to learn something. Without hyperbole, LTNY has the world’s greatest collection of legal/technology minds in one place for 3 days.  Most folks, even the vaunted panelists, judges and industry luminaries are actually quite accessible. So, at a minimum, attend sessions, ask questions and interact with your peers. Try to ignore the bright lights and signs on the floor and make sure to take some useful information back to your firm, company or governmental agency. You’ll undoubtedly have fun (and maybe a Jagermeister shot, too) along the way.

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Losing Weight, Developing an Information Governance Plan, and Other New Year’s Resolutions

Tuesday, January 17th, 2012

It’s already a few weeks into the new year and it’s easy to spot the big lines at the gym, folks working on fad diets and many swearing off any number of vices.  Sadly perhaps, most popular resolutions don’t even really change year after year.  In the corporate world, though, it’s not good enough to simply recycle resolutions every year since there’s a lot more at stake, often with employee’s bonuses and jobs hanging in the balance.

It’s not too late to make information governance part of the corporate 2012 resolution list.  The reason is pretty simple – most companies need to get out of the reactive firefighting of eDiscovery given the risks of sloppy work, inadvertent productions and looming sanctions.  Yet, so many are caught up in the fog of eDiscovery war that they’ve failed to see the nexus between the upstream, proactive good data management hygiene and the downstream eDiscovery chaos.

In many cases the root cause is the disconnect between differing functional groups (Legal, IT, Information Security, Records Management, etc.).  This is where the emerging umbrella concept of Information Governance comes to play, serving as a way to tackle these information risks along a unified front. Gartner defines information governanceas the:

“specification of decision rights, and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archiving and deletion of information, … [including] the processes, roles, standards, and metrics that ensure the effective and efficient use of information to enable an organization to achieve its goals.”

Perhaps more simply put, what were once a number of distinct disciplines—records management, data privacy, information security and eDiscovery—are rapidly coming together in ways that are important to those concerned with mitigating and managing information risk. This new information governance landscape is comprised of a number of formerly discrete categories:

  • Regulatory Risks – Whether an organization is in a heavily regulated vertical or not, there are a host of regulations that an organization must navigate to successfully stay in compliance.  In the United States these include a range of disparate regimes, including the Sarbanes-Oxley Act, HIPPA, the Securities and Exchange Act, the Foreign Corrupt Practices Act (FCPA) and other specialized regulations – any number of which require information to be kept in a prescribed fashion, for specified periods of time.  Failure to turn over information when requested by regulators can have dramatic financial consequences, as well as negative impacts to an organization’s reputation.
  • Discovery Risks – Under the discovery realm there are any number of potential risks as a company moves along the EDRM spectrum (i.e., Identification, Preservation, Collection, Processing, Analysis, Review and Production), but the most lethal risk is typically associated with spoliation sanctions that arise from the failure to adequately preserve electronically stored information (ESI).  There have been literally hundreds of cases where both plaintiffs and defendants have been caught in the judicial crosshairs, resulting in penalties ranging from outright case dismissal to monetary sanctions in the millions of dollars, simply for failing to preserve data properly.  It is in this discovery arena that the failure to dispose of corporate information, where possible, rears its ugly head since the eDiscovery burden is commensurate with the amount of data that needs to be preserved, processed and reviewed.  Some statistics show that it can cost as much as $5 per document just to have an attorney privilege review performed.  And, with every gigabyte containing upwards of 75,000 pages, it is easy to see massive discovery liability when an organization has terabytes and even petabytes of extraneous data lying around.
  • Privacy Risks – Even though the US has a relatively lax information privacy climate there are any number of laws that require companies to notify customers if their personally identifiable information (PII) such as credit card, social security, or credit numbers have been compromised.  For example, California’s data breach notification law (SB1386) mandates that all subject companies must provide notification if there is a security breach to the electronic database containing PII of any California resident.  It is easy to see how unmanaged PII can increase corporate risk, especially as data moves beyond US borders to the international stage where privacy regimes are much more staunch.
  • Information Security Risks Data breaches have become so commonplace that the loss/theft of intellectual property has become an issue for every company, small and large, both domestically and internationally.  The cost to businesses of unintentionally exposing corporate information climbed 7 percent last year to over $7 million per incident.  Recently senators asked the SEC to “issue guidance regarding disclosure of information security risk, including material network breaches” since “securities law obligates the disclosure of any material network breach, including breaches involving sensitive corporate information that could be used by an adversary to gain competitive advantage in the marketplace, affect corporate earnings, and potentially reduce market share.”  The senators cited a 2009 survey that concluded that 38% of Fortune 500 companies made a “significant oversight” by not mentioning data security exposures in their public filings.

Information governance as an umbrella concept helps organizations to create better alignment between functional groups as they attempt to solve these complex and interrelated data risk challenges.  This coordination is even more critical given the way that corporate data is proliferating and migrating beyond the firewall.  With even more data located in the cloud and on mobile devices a key mandate is managing data in all types of form factors. A great first step is to determine ownership of a consolidated information governance approach where the owner can:

  • Get C-Level buy-in
  • Have the organizational savvy to obtain budget
  • Be able to define “reasonable” information governance efforts, which requires both legal and IT input
  • Have strong leadership and consensus building skills, because all stakeholders need to be on the same page
  • Understand the nuances of their business, since an overly rigid process will cause employees to work around the policies and procedures

Next, tap into and then leverage IT or information security budgets for archiving, compliance and storage.  In most progressive organizations there are likely ongoing projects that can be successfully massaged into a larger information governance play.  A great place to focus on initially is information archiving, since this one of the simplest steps an organization can take to improve their information governance hygiene.  With an archive organizations can systematically index, classify and retain information and thus establish a proactive approach to data management.  It’s this ability to apply retention and (most importantly) expiration policies that allows organizations to start reducing the upstream data deluge that will inevitably impact downstream eDiscovery processes.

Once an archive is in place, the next logical step is to couple a scalable, reactive eDiscovery process with the upstream data sources, which will axiomatically include email, but increasingly should encompass cloud content, social media, unstructured data, etc.  It is important to make sure  that a given  archive has been tested to ensure compatibility with the chosen eDiscovery application to guarantee that it can collect content at scale in the same manner used to collect from other data sources.  Overlaying both of these foundational pieces should be the ability to place content on legal hold, whether that content exists in the archive or not.

As we enter 2012, there is no doubt that information governance should be an element in building an enterprise’s information architecture.  And, different from fleeting weight loss resolutions, savvy organizations should vow to get ahead of the burgeoning categories of information risk by fully embracing their commitment to integrated information governance.  And yet, this resolution doesn’t need to encompass every possible element of information governance.  Instead, it’s best to put foundational pieces into place and then build the rest of the infrastructure in methodical and modular fashion.

Information Governance Gets Presidential Attention: Banking Bailout Cost $4.76 Trillion, Technology Revamp Approaches $240 Billion

Tuesday, January 10th, 2012

On November 28, 2011, The White House issued a Presidential Memorandum that outlines what is expected of the 480 federal agencies of the government’s three branches in the next 240 days.  Up until now, Washington, D.C. has been the Wild West with regard to information governance as each agency has often unilaterally adopted its own arbitrary policies and systems.  Moreover, some agencies have recently purchased differing technologies.  Unfortunately,  with the President’s ultimate goal of uniformity, this centralization will be difficult to accomplish with a range of disparate technological approaches.

Particular pain points for the government traditionally include retention, search, collection, review and production of vast amounts of data and records.  Specifically, these pain points include examples of: FOIA requests gone awry, the issuance of legal holds across different agencies leading to spoliation, and the ever present problem of decentralization.

Why is the government different?

Old Practices. First, in some instances the government is technologically behind (its corporate counterparts) and is failing to meet the judiciary’s expectation that organizations effectively store, manage and discover their information.  This failing is self-evident via  the directive coming from the President mandating that these agencies start to get a plan to attack this problem.  Though different than other corporate entities, the government is nevertheless held to the same standards of eDiscovery under the Federal Rules of Civil Procedure (FRCP).  In practice, the government has been given more leniency until recently, and while equal expectations have not always been the case, the gap between the private and public sectors in no longer possible to ignore.

FOIA.  The government’s arduous obligation to produce information under the Freedom of Information Act (FOIA) has no corresponding analog for private organizations, who are responding to more traditional civil discovery requests.  Because the government is so large with many disparate IT systems, it is cumbersome to work efficiently through the information governance process across agencies and many times still difficult inside one individual agency with multiple divisions.  Executing this production process is even more difficult if not impossible to do manually without properly deployed technology.  Additionally, many of the investigatory agencies that issue requests to the private sector need more efficient ways to manage and review data they are requesting.  To compound problems, within the US government there are two opposing interests are at play; both screaming for a resolution, and that solution needs to be centralized.  On the one hand, the government needs to retain more than a corporation may need to in order to satisfy a FOIA request.

Titan Pulled at Both Ends. On the other hand, without classification of the records that are to be kept, technology to organize this vast amount of data and some amount of expiry, every agency will essentially become their own massive repository.  The “retain everything mentality” coupled with the inefficient search and retrieval of data and records is where they stand today.  Corporations are experiencing this on a smaller scale today and many are collectively further along than the government in this process, without the FOIA complications.

What are agencies doing to address these mandates?

In their plans, agencies must describe how they will improve or maintain their records management programs, particularly with regard to email, social media and other electronic communications.  They must also move away from such a paper-centric existence.  eDiscovery consultants and software companies are helping agencies through this process, essentially writing their plans to match the President’s directive.  The cloud conversation has been revisited, and agencies also have to explain how they will use cloud-based services and storage solutions, as well as identify gaps in existing laws or regulations that presently prevent improved management.  Small innovations are taking place.  In fact, just recently the DOJ added a new search feature on their website to make it easier for the public to find documents that have been posted by agencies on their websites.

The Office of Management and Budget (OMB), National Archives and Records Administration (NARA), and Justice Department will use those reports to come up with a government-wide records management framework that is more efficient, maintains accountability by documenting agency actions and promotes “appropriate” public access to records.  Hopefully, the framework they come up with will be centralized and workable on a realistic timeframe with resources sufficiently allocated to the initiative.

How much will this cost?

The President’s mandate is a great initiative and very necessary, but one cannot help but think about the costs in terms of money, time and resources when considering these crucial changes.  The most recent version of a financial services and general government appropriations bill in the Senate extends $378.8 million to NARA for this initiative.  President Obama appointed Steven VanRoekel as the United States CIO in August 2011 to succeed Vivek Kundra.  After VanRoekel’s speech at the Churchill Club in October of 2011, an audience member asked him what the most surprising aspect of his new job was.  VanRoekel said that it was managing the huge and sometimes unwieldy resources of his $80 billion budget.  It is going to take even more than this to do the job right, however.

Using conservative estimates, assume for an agency to implement archiving and eDiscovery capabilities as an initial investment would be $100 million.  That approximates $480 billion for all 480 agencies.  Assume a uniform information governance platform gets adopted by all agencies at a 50% discount due to the large contracts and also factoring in smaller sums for agencies with lesser needs.  The total now comes to $240 billion.  For context, that figure is 5% of what was spent by Federal Government ($4.76 trillion) on the biggest bailout in history in 2008. That leaves a need for $160 billion more to get the job done. VanRoekel also commented at the same meeting that he wants to break down massive multi-year information technology projects into smaller, more modular projects in the hopes of saving the government from getting mired in multi-million dollar failures.   His solution to this, he says, is modular and incremental deployment.

While Rome was not built in a day, this initiative is long overdue, yet feasible, as technology exists to address these challenges rather quickly.  After these 240 days are complete and a plan is drawn the real question is, how are we going to pay now for technology the government needed yesterday?  In a perfect world, the government would select a platform for archiving and eDiscovery, break the project into incremental milestones and roll out a uniform combination of solutions that are best of breed in their expertise.