24h-payday

Archive for the ‘search’ Category

Music piracy the least of your audio worries; Dodd–Frank forces a closer listen

Wednesday, December 11th, 2013

We’re quickly approaching another milestone in the epic implementation of the Commodity Futures Trading Commission (CFTC) rules associated with the Dodd Frank Wall Street Reform and Consumer Protection Act (DFA); the expiration of a very contentious exemptive order that provided relief to cross border swap dealers (SD) and major swap participants (MSP) and foreign groups of US SDs and MSPs. If you follow the heated debate between Wall Street and the CFTC it is quite fitting that the order happens to expire on the winter solstice, December 21st 2013. Let’s hope the day at which the sun comes to a standstill in the sky before reversing direction doesn’t forebode a similar experience in the cross border free markets.

The 848 pages of Dodd-Frank legislation has resulted in (at current count) 67 new rules, exemptive orders, guidance and five ‘other’ actions from the CFTC – the regulatory body tasked with enforcing Title VII of the DFA. Prior to the DFA, the CFTC averaged about four rules per year. eDiscovery nerds will appreciate the fact that the complexity and length of the rules issued by the CFTC requires a website that offers Proximity and Boolean search options to navigate. Within these 67 rules are critical adjustments to the way that organizations, subject to the CFTC’s scope, need to capture, store, manage, search and produce information related to the many flavors of swaps – basically derivatives by which counterparties exchange cash flows of one financial instrument for another. That information includes all data concerning the swap, and communications leading up to the execution of the swap, including any voicemail or phone conversations with relevant information.

While audio discovery is nothing new, especially in regards to criminal investigations, these new regulations, rules and guidance have anointed audio data into the critical content sources category for many enterprises. Let’s discuss what that means for the eDiscovery technology world.

1. Audio search is now must-have eDiscovery functionality

If your organization is categorized as a swap data repository, derivatives clearing organization, designated contract market, swap execution facility, swap dealer, major swap participant and non-MSP counterparty (where most organizations outside financial services will be categorized) you are now subject to new rules for swap record keeping.

First, covered organizations must retain the following:

“…all oral and written communications provided or received concerning quotes, solicitations, bids, offers, instructions, trading, and prices, that lead to the conclusion of a related cash or forward transaction, whether communicated by telephone, voicemail, facsimile, instant messaging, chat rooms, electronic mail, mobile device, or other digital or electronic media.” 77 Fed. Reg. 17 CFR Part 45 (December 8 2010)

Secondly, this data has specific retention and retrieval requirements. At Symantec, we’re keeping track by categorizing them into the 5 & 5, 5 & 3 and 1 & 5 rules:

  • All the data above, except audio files, must be retained for a period of 5 years post termination of the underlying swap.
  • For SDs and MSPs it must be retrievable and producible within 3 days
  • For non-MSP counterparties it must be retrievable and producible within 5 days
  • Audio files, they must be kept for a period of 1-year post termination of the swap and also retrievable and producible within 5 days.

2. A turnkey ‘Dodd – Frank’ solution is unlikely, so a repeatable eDiscovery process is critical

As the CFTC rules were being finalized over the past two years, Symantec invited our customers to discuss the impact of the DFA on their eDiscovery workflows. A primary concern was the belief that the rules required organizations to have a system in place to store and eventually reproduce a trade and associated communications in their entirety. The many lobbyists and organizations that submitted grievances and clarification requests to the CFTC shared this concern. In response, the CFTC adjusted its rules to state that an organization’s swap data need not be categorized and retained in what amounts to a single-swap file, provided that all related information could be retrieved and produced from wherever it resides within the required timeframe.

Although the CFTC isn’t forcing organizations into the implementation of a magical swap data captor, data growth, diversification and dispersion across the organization could still present major challenges to collecting, searching and producing requested swap information on an ad hoc basis. For example, sales and marketing data, research information on commodity markets, email and instant message communications and voice data, would very often be found in multiple systems.

In order to comply, organizations should evaluate whether they have the ability to collect audio files and other information in a timely manner from multiple data repositories. If not retained in a per-swap manner, organizations will need to be able to consolidate all relevant communications and data into a single system so that the review is complete and audit-able for requesting regulatory bodies. But pulling from these various sources is likely to collect a large amount of non-swap data. The ability to confidently exclude the large amount of non-swap related information will help organizations curtail the potential time and costs associated with identifying the proper swap data. Finally, this process should be duplicable for each search, retrieval and production to the CFTC or Swap Data Repositories.

Side note; I’m writing with an eDiscovery-only lens, but the retention and management angle of this particular challenge lends itself to a proactive information governance discussion, one that our friends at eDiscovery Journal have touched upon already.

3. eDiscovery search capabilities must satisfy the unique nature of swap data

The DFA record keeping requirements as it pertains to swaps are unique in that they require the combination of both static, database-like structured data (trade value, time, etc.) and un-structured communications (email, Bloomberg messages, voice mail, etc.) These communications will often bridge multiple systems, for instance, multiple emails and Bloomberg IM’s prior to a phone call confirming the trade. Teams reviewing data prior to production to the CFTC or Swap Data Repositories will be challenged to make sense of the entire communication thread especially under a five-day deadline. This review process is not one to be taken lightly either. Teams need to be extra careful with the search and review of all audio content as they risk mistakenly producing spoken information, not as easily identified as written, that is not related to the trade.

Organizations should consider how quickly they could get the necessary information in a searchable form. Five days to retrieve and produce is slim at best, so even audio processing advantages, like phonetic based audio indexing as opposed to speech to text to transcription could be critical. They should also consider how they can organize swap communications into a coherent form – functionality like discussion threading and topic clustering can help teams quickly understand and identify communication related to a specific swap.

The Symantec eDiscovery team considered the Dodd Frank Act and CFTC rules as we developed our latest release of the Clearwell eDiscovery Platform, from Symantec, now enabling advanced audio processing, search, and review capabilities to drastically accelerate audio discovery efforts. In addition to supporting over 400 file types for electronic discovery, these new capabilities leverage a powerful phonetic engine that can index up to 20,000 hours of recorded audio per day. Whether you are investigating voicemails, call-center recordings, or financial transactions, Symantec makes it easy to find what you are looking for.

 

Breaking News: Court Orders Google to Produce eDiscovery Search Terms in Apple v. Samsung

Friday, May 10th, 2013

Apple obtained a narrow discovery victory yesterday in its long running legal battle against fellow technology titan Samsung. In Apple Inc. v. Samsung Electronics Co. Ltd, the court ordered non-party Google to turn over the search terms and custodians that it used to produce documents in response to an Apple subpoena.

According to the court’s order, Apple argued for the production of Google’s search terms and custodians in order “to know how Google created the universe from which it produced documents.” The court noted that Apple sought such information “to evaluate the adequacy of Google’s search, and if it finds that search wanting, it then will pursue other courses of action to obtain responsive discovery.”

Google countered that argument by defending the extent of its production and the burdens that Apple’s request would place on Google as a non-party to Apple’s dispute with Samsung. Google complained that Apple’s demands were essentially a gateway to additional discovery from Google, which would arguably be excessive given Google’s non-party status.

Sensitive to the concerns of both parties, the court struck a middle ground in its order. On the one hand, the court ordered Google to produce the search terms and custodians since that “will aid in uncovering the sufficiency of Google’s production and serves greater purposes of transparency in discovery.” But on the other hand, the court preserved Google’s right to object to any further discovery efforts by Apple: “The court notes that its order does not speak to the sufficiency of Google’s production nor to any arguments Google may make regarding undue burden in producing any further discovery.”

This latest opinion from the Apple v. Samsung series of lawsuits is noteworthy for two reasons. First, the decision is instructive regarding the eDiscovery burdens that non-parties must shoulder in litigation. While the disclosure of a non-party’s underlying search methodology (in this instance, search terms and custodians) may not be unduly burdensome, further efforts to obtain non-party documents could exceed the boundaries of reasonableness that courts have designed to protect non-parties from the vicissitudes of discovery. For as the court in this case observed, a non-party “should not be required to ‘subsidize’ litigation to which it is not a party.”

Second, the decision illustrates that the use of search terms remains a viable method for searching and producing responsive ESI. Despite the increasing popularity of predictive coding technology, it is noteworthy that neither the court nor Apple took issue with Google’s use of search terms in connection with its production process. Indeed, the intelligent use of keyword searches is still an acceptable eDiscovery approach for most courts, particularly where the parties agree on the terms. That other forms of technology assisted review, such as predictive coding, could arguably be more efficient and cost effective in identifying responsive documents does not impugn the use of keyword searches in eDiscovery. Only time will tell whether the use of keyword searches as the primary means for responding to document requests will give way to more flexible approaches that include the use of multiple technology tools.

From A to PC – Running a Defensible Predictive Coding Workflow

Tuesday, September 11th, 2012

So far in our ongoing predictive coding blog series, we’ve touched on the “whys” and “whats” of predictive coding, and now I’d like to address the “hows” of using this new technology. Given that predictive coding is groundbreaking technology in the world of eDiscovery, it’s no surprise that a different workflow is required in order to run the review process.

The traditional linear review process utilizes a “brute force” approach of manually reading each document and processing it for responsiveness and privilege. In order to reduce the high cost of this process, many organizations now farm out documents to contract attorneys for review. Often, however, contract attorneys possess less expertise and knowledge of the issues, which means that multiple review passes along with additional checks and balances are often needed in order to ensure review accuracy. This process commonly results in a significant number of documents being reviewed multiple times, which in turn increases the cost of review. When you step away from an “eyes-on review” of every document and use predictive coding to leverage the expertise of more experienced attorneys, you will naturally aim to review as few documents as possible in order to achieve the best possible results.

How do you review the minimum number of documents with predictive coding? For starters, organizations should prepare their case to use predictive coding by performing an early case assessment (ECA) in order to cull down to your review population prior to review. While some may suggest that predictive coding can be run without any ECA up front, you will actually save a significant amount of review time if you put in the effort to cull out the profoundly irrelevant documents in your case. Doing so will prevent a “junk in, junk out” situation where leaving too much junk in the case will result in having to necessarily review a number of junk documents throughout the predictive coding workflow.

Next, segregating documents that are unsuitable for predictive coding is important. Most predictive coding solutions leverage the extracted text content within documents to operate. That means any documents that do not contain extracted text, such as photographs and engineering schematics, should be manually reviewed so they are not overlooked by the predictive coding engine. The same concept applies to any other document that has other reviewable limitations, such as encrypted and password protected files. All of these documents should be reviewed separately as to not miss any relevant documents.

After culling down to your review population, the next step in preparing to use predictive coding is to create a Control Set by drawing a randomly selected statistical sample from the document population. Once the Control Set is manually reviewed, it will serve two main purposes. First, it will allow you to estimate the population yield, otherwise referred to as the percentage of responsive documents contained within the larger population. (The size of the control set may need to be adjusted to insure the yield is properly taken into account). Second, it will serve as your baseline for a true “apples-to-apples” comparison of your prediction accuracy across iterations as you move through the predictive coding workflow. The Control Set will only need to be reviewed once up front to be used for measuring accuracy throughout the workflow.

It is essential that the documents in the Control Set are selected randomly from the entire population. While some believe that taking other sampling approaches give better peace of mind, they actually may result in unnecessary review. For example, other workflows recommend sampling from the documents that are not predicted to be relevant to see if anything was left behind. If you instead create a proper Control Set from the entire population, you can get the necessary precision and recall metrics that are representative of the entire population, which in turn represents the documents that are not predicted to be relevant.

Once the Control Set is created, you can begin training the software to evaluate documents by the review criteria in the case. Selecting the optimal set of documents to train the system (commonly referred to as the training set or seed set) is one of the most important steps in the entire predictive coding workflow as it sets the initial accuracy for the system, and thus it should be chosen carefully. Some suggest creating the initial training set by taking a random sample (much like how the control set is selected) from the population instead of proactively selecting responsive documents. However, the important thing to understand is that any items used for training should accurately represent the responsive items instead. The reason selecting responsive documents for inclusion in the training set is important is related to the fact that most eDiscovery cases generally have low yield – meaning the prevalence of responsive documents contained within the overall document population is low. This means the system will not be able to effectively learn how to identify responsive items if enough responsive documents are not included in the training set.

An effective method for selecting the initial training set is to use a targeted search to locate a small set of documents (typically between 100-1000) that is expected to be about 50% responsive. For example, you may choose to focus on only the key custodians in the case and use a combination of tighter keyword/date range/etc search criteria. You do not have to perform exhaustive searches, but a high quality initial training set will likely minimize the amount of additional training needed to achieve high prediction accuracy.

After the initial training set is selected, it must then be reviewed. It is extremely important that the review decisions made on any training items are as accurate as possible since the systems will be learning from these items, which typically means that the more experienced case attorneys should be used for this review. Once review is finished on all of the training documents, then the system can learn from the tagging decisions in order to be able to predict the responsiveness or non-responsiveness of the remaining documents.

While you can now predict on all of the other documents in the population, it is most important to predict on the Control Set at this time. Not only may this decision be more time effective than applying predictions to all the documents in the case, but you will need predictions on all of the documents in the Control Set in order to assess the accuracy of the predictions. With predictions and tagging decisions on each of the Control Set documents, you will be able to get accurate precision and recall metrics that you can extrapolate to the entire review population.

At this point, the accuracy of the predictions is likely to not be optimal, and thus the iterative process begins. In order to increase the accuracy, you must select additional documents to use for training the system. Much like the initial training set, this additional training set must also be selected carefully. The best documents to use for an additional training set are those that the system would be unable to accurately predict. Rather than choosing these documents manually, the software is often able to mathematically determine this set more effectively than human reviewers. Once these documents are selected, you simply continue the iterative process of training, predicting and testing until your precision and recall are at an acceptable point. Following this workflow will result in a set of documents identified to be responsive by the system along with trustworthy and defensible accuracy metrics.

You cannot simply produce all of these documents at this point, however. The documents must still go through a privileged screen in order to remove any documents that should not be produced, and also go through any other review measures that you usually take on your responsive documents. This does, however, open up the possibility of applying additional rounds of predictive coding on top of this set of responsive documents. For example, after running the privileged screen, you can train on the privileged tag and attempt to identify additional privileged documents in your responsive set that were missed.

The important thing to keep in mind is that predictive coding is meant to strengthen your current review workflows. While we have outlined one possible workflow that utilizes predictive coding, the flexibility of the technology lends itself to be utilized for a multitude of other uses, including prioritizing a linear review. Whatever application you choose, predictive coding is sure to be an effective tool in your future reviews.

Gartner’s 2012 Magic Quadrant for E-Discovery Software Looks to Information Governance as the Future

Monday, June 18th, 2012

Gartner recently released its 2012 Magic Quadrant for E-Discovery Software, which is its annual report analyzing the state of the electronic discovery industry. Many vendors in the Magic Quadrant (MQ) may initially focus on their position and the juxtaposition of their competitive neighbors along the Visionary – Execution axis. While a very useful exercise, there are also a number of additional nuggets in the MQ, particularly regarding Gartner’s overview of the market, anticipated rates of consolidation and future market direction.

Context

For those of us who’ve been around the eDiscovery industry since its infancy, it’s gratifying to see the electronic discovery industry mature.  As Gartner concludes, the promise of this industry isn’t off in the future, it’s now:

“E-discovery is now a well-established fact in the legal and judicial worlds. … The growth of the e-discovery market is thus inevitable, as is the acceptance of technological assistance, even in professions with long-standing paper traditions.”

The past wasn’t always so rosy, particularly when the market was dominated by hundreds of service providers that seemed to hold on by maintaining a few key relationships, combined with relatively high margins.

“The market was once characterized by many small providers and some large ones, mostly employed indirectly by law firms, rather than directly by corporations. …  Purchasing decisions frequently reflected long-standing trusted relationships, which meant that even a small book of business was profitable to providers and the effects of customary market forces were muted. Providers were able to subsist on one or two large law firms or corporate clients.”

Consolidation

The Magic Quadrant correctly notes that these “salad days” just weren’t feasible long term. Gartner sees the pace of consolidation heating up even further, with some players striking it rich and some going home empty handed.

“We expect that 2012 and 2013 will see many of these providers cease to exist as independent entities for one reason or another — by means of merger or acquisition, or business failure. This is a market in which differentiation is difficult and technology competence, business model rejuvenation or size are now required for survival. … The e-discovery software market is in a phase of high growth, increasing maturity and inevitable consolidation.”

Navigating these treacherous waters isn’t easy for eDiscovery providers, nor is it simple for customers to make purchasing decisions if they’re correctly concerned that the solution they buy today won’t be around tomorrow.  Yet, despite the prognostication of an inevitable shakeout (Gartner forecasts that the market will shrink 25% in the raw number of firms claiming eDiscovery products/services) they are still very bullish about the sector.

“Gartner estimates that the enterprise e-discovery software market came to $1 billion in total software vendor revenue in 2010. The five-year CAGR to 2015 is approximately 16%.”

This certainly means there’s a window of opportunity for certain players – particularly those who help larger players fill out their EDRM suite of offerings, since the best of breed era is quickly going by the wayside.  Gartner notes that end-to-end functionality is now table stakes in the eDiscovery space.

“We have seen a large upsurge in user requests for full-spectrum EDRM functionality. Whether that functionality will be used initially, or at all, remains an open question. Corporate buyers do seem minded to future-proof their investments in this way, by anticipating what they may wish to do with the software and the vendor in the future.”

Information Governance

Not surprisingly, it’s this “full-spectrum” functionality that most closely aligns with marrying the reactive, right side of the EDRM with the proactive, left side.  In concert, this yin and yang is referred to as information governance, and it’s this notion that’s increasingly driving buying behaviors.

“It is clear from our inquiry service that the desire to bring e-discovery under control by bringing data under control with retention management is a strategy that both legal and IT departments pursue in order to control cost and reduce risks. Sometimes the archiving solution precedes the e-discovery solution, and sometimes it follows it, but Gartner clients that feel the most comfortable with their e-discovery processes and most in control of their data are those that have put archiving systems in place …”

As Gartner looks out five years, the analyst firm anticipates more progress on the information governance front, because the “entire e-discovery industry is founded on a pile of largely redundant, outdated and trivial data.”  At some point this digital landfill is going to burst and organizations are finally realizing that if they don’t act now, it may be too late.

“During the past 10 to 15 years, corporations and individuals have allowed this data to accumulate for the simple reason that it was easy — if not necessarily inexpensive — to do so. … E-discovery has proved to be a huge motivation for companies to rethink their information management policies. The problem of determining what is relevant from a mass of information will not be solved quickly, but with a clear business driver (e-discovery) and an undeniable return on investment (deleting data that is no longer required for legal or business purposes can save millions of dollars in storage costs) there is hope for the future.”

 

The Gartner Magic Quadrant for E-Discovery Software is insightful for a number of reasons, not the least of which is how it portrays the developing maturity of the electronic discovery space. In just a few short years, the niche has sprouted wings, raced to $1B and is seeing massive consolidation. As we enter the next phase of maturation, we’ll likely see the sector morph into a larger, information governance play, given customers’ “full-spectrum” functionality requirements and the presence of larger, mainstream software companies.  Next on the horizon is the subsuming of eDiscovery into both the bigger information governance umbrella, as well as other larger adjacent plays like “enterprise information archiving, enterprise content management, enterprise search and content analytics.” The rapid maturation of the eDiscovery industry will inevitably result in growing pains for vendors and practitioners alike, but in the end we’ll all benefit.

 

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Gartner’s “2012 Magic Quadrant for E-Discovery Software” Provides a Useful Roadmap for Legal Technologists

Tuesday, May 29th, 2012

Gartner has just released its 2012 Magic Quadrant for E-Discovery Software, which is an annual report that analyzes the state of the electronic discovery industry and provides a detailed vendor-by-vendor evaluation. For many, particularly those in IT circles, Gartner is an unwavering north star used to divine software market leaders, in topics ranging from business intelligence platforms to wireless lan infrastructures. When IT professionals are on the cusp of procuring complex software, they look to analysts like Gartner for quantifiable and objective recommendations – as a way to inform and buttress their own internal decision making processes.

But for some in the legal technology field (particularly attorneys), looking to Gartner for software analysis can seem a bit foreign. Legal practitioners are often more comfortable with the “good ole days” when the only navigation aid in the eDiscovery world was provided by the dynamic duo of George Socha and Tom Gelbmanm, who (beyond creating the EDRM) were pioneers of the first eDiscovery rankings survey. Albeit somewhat short lived, their Annual Electronic Discovery[i] Survey ranked the hundreds of eDiscovery providers and bucketed the top tier players in both software and litigation support categories. The scope of their mission was grand, and they were perhaps ultimately undone by the breadth of their task (stopping the Survey in 2010), particularly as the eDiscovery landscape continued to mature, fragment and evolve.

Gartner, which has perfected the analysis of emerging software markets, appears to have taken on this challenge with an admittedly more narrow (and likely more achievable) focus. Gartner published its first Magic Quadrant (MQ) for the eDiscovery industry last year, and in the 2012 Magic Quadrant for E-Discovery Software report they’ve evaluated the top 21 electronic discovery software vendors. As with all Gartner MQs, their methodology is rigorous; in order to be included, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon criteria for completeness of vision and ability to execute.

By eliminating the legion of service providers and law firms, Gartner has made their mission both more achievable and perhaps (to some) less relevant. When talking to certain law firms and litigation support providers, some seem to treat the Gartner initiative (and subsequent Magic Quadrant) like a map from a land they never plan to visit. But, even if they’re not directly procuring eDiscovery software, the Gartner MQ should still be seen by legal technologists as an invaluable tool to navigate the perils of the often confusing and shifting eDiscovery landscape – particularly with the rash of recent M&A activity.

Beyond the quadrant positions[ii], comprehensive analysis and secular market trends, one of the key underpinnings of the Magic Quadrant is that the ultimate position of a given provider is in many ways an aggregate measurement of overall customer satisfaction. Similar in ways to the net promoter concept (which is a tool to gauge the loyalty of a firm’s customer relationships simply by asking how likely that customer is to recommend a product/service to a colleague), the Gartner MQ can be looked at as the sum total of all customer experiences.[iii] As such, this usage/satisfaction feedback is relevant even for parties that aren’t purchasing or deploying electronic discovery software per se. Outside counsel, partners, litigation support vendors and other interested parties may all end up interacting with a deployed eDiscovery solution (particularly when such solutions have expanded their reach as end-to-end information governance platforms) and they should want their chosen solution to used happily and seamlessly in a given enterprise. There’s no shortage of stories about unhappy outside counsel (for example) that complain about being hamstrung by a slow, first generation eDiscovery solution that ultimately makes their job harder (and riskier).

Next, the Gartner MQ also is a good short-handed way to understand more nuanced topics like time to value and total cost of ownership. While of course related to overall satisfaction, the Magic Quadrant does indirectly address the query about whether the software does what it says it will (delivering on the promise) in the time frame that is claimed (delivering the promise in a reasonable time frame) since these elements are typically subsumed in the satisfaction metric. This kind of detail is disclosed in the numerous interviews that Gartner conducts to go behind the scenes, querying usage and overall satisfaction.

While no navigation aid ensures that a traveler won’t get lost, the Gartner Magic Quadrant for E-Discovery Software is a useful map of the electronic discovery software world. And, particularly looking at year-over-year trends, the MQ provides a useful way for legal practitioners (beyond the typical IT users) to get a sense of the electronic discovery market landscape as it evolves and matures. After all, staying on top of the eDiscovery industry has a range of benefits beyond just software procurement.

Please register here to access the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



[i] Note, in the good ole days folks still used two words to describe eDiscovery.

[ii] Gartner has a proprietary matrix that it uses to place the entities into four quadrants: Leaders, Challengers, Visionaries and Niche Players.

[iii] Under the Ability to Execute axis Gartner weighs a number of factors including “Customer Experience: Relationships, products and services or programs that enable clients to succeed with the products evaluated. Specifically, this criterion includes implementation experience, and the ways customers receive technical support or account support. It can also include ancillary tools, the existence and quality of customer support programs, availability of user groups, service-level agreements and so on.”

District Court Upholds Judge Peck’s Predictive Coding Order Over Plaintiff’s Objection

Monday, April 30th, 2012

In a decision that advances the predictive coding ball one step further, United States District Judge Andrew L. Carter, Jr. upheld Magistrate Judge Andrew Peck’s order in Da Silva Moore, et. al. v. Publicis Groupe, et. al. despite Plaintiff’s multiple objections. Although Judge Carter rejected all of Plaintiff’s arguments in favor of overturning Judge Peck’s predictive coding order, he did not rule on Plaintiff’s motion to recuse Judge Peck from the current proceedings – a matter that is expected to be addressed separately at a later time. Whether or not a successful recusal motion will alter this or any other rulings in the case remains to be seen.

Finding that it was within Judge Peck’s discretion to conclude that the use of predictive coding technology was appropriate “under the circumstances of this particular case,” Judge Carter summarized Plaintiff’s key arguments listed below and rejected each of them in his five-page Opinion and Order issued on April 26, 2012.

  • the predictive coding method contemplated in the ESI protocol lacks generally accepted reliability standards,
  • Judge Peck improperly relied on outside documentary evidence,
  • Defendant MSLGroup’s (“MSL’s”) expert is biased because the use of predictive coding will reap financial benefits for his company,
  • Judge Peck failed to hold an evidentiary hearing and adopted MSL’s version of the ESI protocol on an insufficient record and without proper Rule 702 consideration

Since Judge Peck’s earlier order is “non-dispositive,” Judge Carter identified and applied the “clearly erroneous or contrary to law” standard of review in rejecting Plaintiffs’ request to overturn the order. Central to Judge Carter’s reasoning is his assertion that any confusion regarding the ESI protocol is immaterial because the protocol “contains standards for measuring the reliability of the process and the protocol builds in levels of participation by Plaintiffs.” In other words, Judge Carter essentially dismisses Plaintiff’s concerns as premature on the grounds that the current protocol provides a system of checks and balances that protects both parties. To be clear, that doesn’t necessarily mean Plaintiffs won’t get a second bite of the apple if problems with MSL’s productions surface.

For now, however, Judge Carter seems to be saying that although Plaintiffs must live with the current order, they are by no means relinquishing their rights to a fair and just discovery process. In fact, the existing protocol allows Plaintiffs to actively participate in and monitor the entire process closely. For example, Judge Carter writes that, “if the predictive coding software is flawed or if Plaintiffs are not receiving the types of documents that should be produced, the parties are allowed to reconsider their methods and raise their concerns with the Magistrate Judge.”

Judge Carter also specifically addresses Plaintiff’s concerns related to statistical sampling techniques which could ultimately prove to be their meatiest argument. A key area of disagreement between the parties is whether or not MSL is reviewing enough documents to insure relevant documents are not completely overlooked even if this complex process is executed flawlessly. Addressing this point Judge Carter states that, “If the method provided in the protocol does not work or if the sample size is indeed too small to properly apply the technology, the Court will not preclude Plaintiffs from receiving relevant information, but to call the method unreliable at this stage is speculative.”

Although most practitioners are focused on seeing whether and how many of these novel predictive coding issues play out, it is important not to overlook two key nuggets of information lining Judge Carter’s Opinion and Order. First, Judge Carter’s statement that “[t]here simply is no review tool that guarantees perfection” serves as an acknowledgement that “reasonableness” is the standard by which discovery should be measured, not “perfection.” Second, Judge Carter’s acknowledgement that manual review with keyword searches may be appropriate in certain situations should serve as a wake-up call for those who think predictive coding technology will replace all predecessor technologies. To the contrary, predictive coding is a promising new tool to add to the litigator’s tool belt, but it is not necessarily a replacement for all other technology tools.

Plaintiffs in Da Silva Moore may not have received the ruling they were hoping for, but Judge Carter’s Opinion and Order makes it clear that the court house door has not been closed. Given the controversy surrounding this case, one can assume that Plaintiffs are likely to voice many of their concerns at a later date as discovery proceeds. In other words, don’t expect all of these issues to fade away without a fight.

First State Court Issues Order Approving the Use of Predictive Coding

Thursday, April 26th, 2012

On Monday, Virginia Circuit Court Judge James H. Chamblin issued what appears to be the first state court Order approving the use of predictive coding technology for eDiscovery. Tuesday, Law Technology News reported that Judge Chamblin issued the two-page Order in Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles Jet Center, et al, over Plaintiffs’ objection that traditional manual review would yield more accurate results. The case stems from the collapse of three hangars at the Dulles Jet Center (“DJC”) that occurred during a major snow storm on February 6, 2010. The Order was issued at Defendants’ request after opposing counsel objected to their proposed use of predictive coding technology to “retrieve potentially relevant documents from a massive collection of electronically stored information.”

In Defendants’ Memorandum in Support of their motion, they argue that a first pass manual review of approximately two million documents would cost two million dollars and only locate about sixty percent of all potentially responsive documents. They go on to state that keyword searching might be more cost-effective “but likely would retrieve only twenty percent of the potentially relevant documents.” On the other hand, they claim predictive coding “is capable of locating upwards of seventy-five percent of the potentially relevant documents and can be effectively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching.”

In their Opposition Brief, Plaintiffs argue that Defendants should produce “all responsive documents located upon a reasonable inquiry,” and “not just the 75%, or less, that the ‘predictive coding’ computer program might select.” They also characterize Defendants’ request to use predictive coding technology instead of manual review as a “radical departure from the standard practice of human review” and point out that Defendants cite no case in which a court compelled a party to accept a document production selected by a “’predictive coding’ computer program.”

Considering predictive coding technology is new to eDiscovery and first generation tools can be difficult to use, it is not surprising that both parties appear to frame some of their arguments curiously. For example, Plaintiffs either mischaracterize or misunderstand Defendants’ proposed workflow given their statement that Defendants want a “computer program to make the selections for them” instead of having “human beings look at and select documents.” Importantly, predictive coding tools require human input for a computer program to “predict” document relevance. Additionally, the proposed approach includes an additional human review step prior to production that involves evaluating the computer’s predictions.

On the other hand, some of Defendants’ arguments also seem to stray a bit off course. For example, Defendants’ seem to unduly minimize the value of using other tools in the litigator’s tool belt like keyword search or topic grouping to cull data prior to using potentially more expensive predictive coding technology. To broadly state that keyword searching “likely would retrieve only twenty percent of the potentially relevant documents” seems to ignore two facts. First, keyword search for eDiscovery is not dead. To the contrary, keyword searches can be an effective tool for broadly culling data prior to manual review and for conducting early case assessments. Second, the success of keyword searches and other litigation tools depends as much on the end user as the technology. In other words, the carpenter is just as important as the hammer.

The Order issued by Judge Chamblin, the current Chief Judge for the 20th Judicial Circuit of Virginia, states that “Defendants shall be allowed to proceed with the use of predictive coding for purposes of the processing and production of electronically stored information.”  In a hand written notation, the Order further provides that the processing and production is to be completed within 120 days, with “processing” to be completed within 60 days and “production to follow as soon as practicable and in no more than 60 days.” The order does not mention whether or not the parties are required to agree upon a mutually agreeable protocol; an issue that has plagued the court and the parties in the ongoing Da Silva Moore, et. al. v. Publicis Groupe, et. al. for months.

Global Aerospace is the third known predictive coding case on record, but appears to present yet another set of unique legal and factual issues. In Da Silva Moore, Judge Andrew Peck of the Southern District of New York rang in the New Year by issuing the first known court order endorsing the use of predictive coding technology.  In that case, the parties agreed to the use of predictive coding technology, but continue to fight like cats and dogs to establish a mutually agreeable protocol.

Similarly, in the 7th Federal Circuit, Judge Nan Nolan is tackling the issue of predictive coding technology in Kleen Products, LLC, et. al. v. Packaging Corporation of America, et. al. In Kleen, Plaintiffs basically ask that Judge Nolan order Defendants to redo their production even though Defendants have spent thousands of hours reviewing documents, have already produced over a million documents, and their review is over 99 percent complete. The parties have already presented witness testimony in support of their respective positions over the course of two full days and more testimony may be required before Judge Nolan issues a ruling.

What is interesting about Global Aerospace is that Defendants proactively sought court approval to use predictive coding technology over Plaintiffs’ objections. This scenario is different than Da Silva Moore because the parties in Global Aerospace have not agreed to the use of predictive coding technology. Similarly, it appears that Defendants have not already significantly completed document review and production as they had in Kleen Products. Instead, the Global Aerospace Defendants appear to have sought protection from the court before moving full steam ahead with predictive coding technology and they have received the court’s blessing over Plaintiffs’ objection.

A key issue that the Order does not address is whether or not the parties will be required to decide on a mutually agreeable protocol before proceeding with the use of predictive coding technology. As stated earlier, the inability to define a mutually agreeable protocol is a key issue that has plagued the court and the parties for months in Da Silva Moore, et. al. v. Publicis Groupe, et. al. Similarly, in Kleen, the court was faced with issues related to the protocol for using technology tools. Both cases highlight the fact that regardless of which eDiscovery technology tools are selected from the litigator’s tool belt, the tools must be used properly in order for discovery to be fair.

Judge Chamblin left the barn door wide open for Plaintiffs to lodge future objections, perhaps setting the stage for yet another heated predictive coding battle. Importantly, the Judge issued the Order “without prejudice to a receiving party” and notes that parties can object to the “completeness or the contents of the production or the ongoing use of predictive coding technology.”  Given the ongoing challenges in Da Silva Moore and Kleen, don’t be surprised if the parties in Global Aerospace Inc. face some of the same process-based challenges as their predecessors. Hopefully some of the early challenges related to the use of first generation predictive coding tools can be overcome as case law continues to develop and as next generation predictive coding tools become easier to use. Stay tuned as the facts, testimony, and arguments related to Da Silva Moore, Kleen Products, and Global Aerospace Inc. cases continue to evolve.

Breaking News: Court Clarifies Duty to Preserve Evidence, Denies eDiscovery Sanctions Motion Against Pfizer

Wednesday, April 18th, 2012

It is fortunately becoming clearer that organizations do not need to preserve information until litigation is “reasonably anticipated.” In Brigham Young University v. Pfizer (D. Utah Apr. 16, 2012), the court denied the plaintiff university’s fourth motion for discovery sanctions against Pfizer, likely ending its chance to obtain a “game-ending” eDiscovery sanction. The case, which involves disputed claims over the discovery and development of prominent anti-inflammatory drugs, is set for trial on May 29, 2012.

In Brigham Young, the university pressed its case for sanctions against Pfizer based on a vastly expanded concept of a litigant’s preservation duty. Relying principally on the controversial Phillip M. Adams & Associates v. Dell case, the university argued that Pfizer’s “duty to preserve runs to the legal system generally.” The university reasoned that just as the defendant in the Adams case was “sensitized” by earlier industry lawsuits to the real possibility of plaintiff’s lawsuit, Pfizer was likewise put on notice of the university’s claims due to related industry litigation.

The court rejected such a sweeping characterization of the duty to preserve, opining that it was “simply too broad.” Echoing the concerns articulated by the Advisory Committee when it framed the 2006 amendments to the Federal Rules of Civil Procedure (FRCP), the court took pains to emphasize the unreasonable burdens that parties such as Pfizer would face if such a duty were imposed:

“It is difficult for the Court to imagine how a party could ever dispose of information under such a broad duty because of the potential for some distantly related litigation that may arise years into the future.”

The court also rejected the university’s argument because such a position failed to appreciate the basic workings of corporate records retention policies. As the court reasoned, “[e]vidence may simply be discarded as a result of good faith business procedures.” When those procedures operate to inadvertently destroy evidence before the duty to preserve is triggered, the court held that sanctions should not issue: “The Federal Rules protect from sanctions those who lack control over the requested materials or who have discarded them as a result of good faith business procedures.”

The Brigham Young case is significant for a number of reasons. First, it reiterates that organizations need not keep electronically stored information (ESI) for legal or regulatory purposes until the duty to preserve is reasonably anticipated. As American courts have almost uniformly held since the 1997 case of Concord Boat Corp. v. Brunswick Corp., organizations are not required to keep every piece of paper, every email, every electronic document and every back up tape.

Second, Brigham Young emphasizes that organizations can and should use document retention protocols to rid themselves of data stockpiles. Absent a preservation duty or other exceptional circumstances, paring back ESI pursuant to “good faith business procedures” (such as a neutral retention policy) will be protected under the law.

Finally, Brigham Young narrows the holding of the Adams case to its particular facts. The Adams case has been particularly troublesome to organizations as it arguably expanded their preservation duty in certain circumstances. However, Brigham Young clarified that this expansion was unwarranted in the instant case, particularly given that Pfizer documents were destroyed pursuant to “good faith business procedures.”

In summary, Brigham Young teaches that organizations will be protected from eDiscovery sanctions to the extent they destroy ESI in good faith pursuant to a reasonable records retention policy. This will likely bring a sigh of relief to enterprises struggling with the information explosion since it encourages confident deletion of data when the coast is clear of a discrete litigation event.

Take Two and Call me in the Morning: U.S. Hospitals Need an Information Governance Remedy

Wednesday, April 11th, 2012

Given the vast amount of sensitive information and legal exposure faced by hospitals today it’s a mystery why these organizations aren’t taking advantage of enabling technologies to minimize risk. Both HIPPA and the HITECH Act are often achieved by manual, ad hoc methods, which are hazardous at best. In the past, state and federal auditing environments have not been very aggressive in ensuring compliance, but that is changing. While many hospitals have invested in high tech records management systems (EMR/EHR), those systems do not encompass the entire information and data environment within a hospital. Sensitive information often finds its way into and onto systems outside the reach of EMR/EHR systems, bringing with it increased exposure to security breach and legal liability.

This information overload often metastasizes into email (both hospital and personal), attachments, portable storage devices, file, web and development servers, desktops and laptops, home or affiliated practice’s computers and mobile devices such as iPads and smart phones. These avenues for the dissemination and receipt of information expand the information governance challenge and data security risks. Surprisingly, the feedback from the healthcare sector suggests that hospitals rarely get sued in federal court.

One place hospitals do not want to be is the “Wall of Shame,” otherwise known as the HHS website that has detailed 281 Health Insurance Portability and Accountability Act (HIPAA) security violations that have affected more than 500 individuals as of June 9, 2011. Overall, physical theft and loss accounted for about 63% of the reported breaches. Unauthorized access / disclosure accounted for another 16%, while hacking was only 6%. While Software Advice reasons these statistics seem to indicate that physical theft has been the reason for the majority of breaches, it should also be considered that due to the lack of data loss prevention technology, many hospitals are unaware of breaches that have occurred and therefore cannot report on them.

There are a myriad of reasons hospitals aren’t landing on the front page of the newspaper with the same frequency as other businesses and government agencies when it comes to security breach, and document retention and eDiscovery blunders. But, the underlying contagion is not contained and it certainly is not benign. Feedback from the field reveals some alarming symptoms of the unhealthy state of healthcare information governance, including:

  • uncontrolled .pst files
  • exploding storage growth
  • missing or incomplete data retention rules
  • doctors/nurses storing and sending sensitive data via their personal email, iPads and smartphones
  • encryption rules that rely on individuals to determine what to encrypt
  • data backup policies that differ from data retention and information governance rules
  • little to no compliance training
  • and many times non-existent data loss prevention efforts.

This results in the need for more storage, while creating larger legal liability, an indefensible eDiscovery posture, and the risk of breach.

The reason this problem remains latent in most hospitals is because they are not yet feeling the pain of the problem from massive and multiple lawsuits, large invoices from outside law firms or the operational challenges/costs incurred from searching through many mountains of dispersed data.  The symptoms are observable, the pathology is present, the problem is real and the pain is about to acutely present itself as more states begin to deeply embrace eDiscovery requirements and government regulators increase audit frequency and fine amounts. Another less talked about reason hospitals have not had the same pressure to search and produce their data pursuant to litigation is due to cases being settled before they even get to the discovery stage. The lack of well-developed information governance practices leads to cases being settled too soon, for too much money when they otherwise may not have needed to settle at all.

The Patient’s Symptoms Were Treated, but the Patient’s Data Still Needs Medicine

What is still unclear is why hospitals, given their compliance requirements and tightening IT budgets, aren’t archiving, classifying, and protecting their data with the same type of innovation they are demonstrating in their cutting edge patient care technology. In this realm, two opposite ends of the IT innovation spectrum seem to co-exist in the hospital’s data environment. This dichotomy leaves much of a hospital’s data unprotected, unorganized and uncontrolled. Hospitals are experiencing increasing data security breaches and often are not aware that a breach or data loss has occurred. As more patient data is created and copied in electronic format, used in and exposed by an increasing number of systems and delivered on emerging mobile platforms, the legal and audit risks are compounding on top of a faulty or missing information governance foundation.

Many hospitals have no retention schedules or data classification rules applied to existing information, which often results in a checkbox compliance mentality and a keep-everything-forever practice. Additionally, many hospitals have no ability to apply a comprehensive legal hold across different data sources and lack technology to stop or alert them when there has been a breach.

Information Governance and Data Health in Hospitals

With the mandated push for paper to be converted to digital records, many hospitals are now evaluating the interplay of their various information management and distribution systems. They must consider the newly scanned legacy data (or soon to be scanned), and if they have been operating without an archive, they must now look to implement a searchable repository where they can collectively apply document retention and records management while decreasing the amount of storage needed to retain the data.  We are beginning to see internal counsel leading the way to make this initiative happen across business units. Different departments are coming together to pool resources in tight economic and high regulation times that require collaboration.  We are at the beginning of a widespread movement in the healthcare industry for archiving, data classification and data loss prevention as hospitals link their increasing compliance and data loss requirements with the need to optimize and minimize storage costs. Finally, it comes as no surprise that the amount of data hospitals are generating is crippling their infrastructures, breaking budgets and serving as the primary motivator for change absent lawsuits and audits.

These factors are bringing together various stakeholders into the information governance conversation, helping to paint a very clear picture that putting in place a comprehensive information governance solution is in the entire hospital’s best interest. The symptoms are clear, the problem is treatable, the prescription for information governance is well proven. Hospitals can begin this process by calling an information governance meeting with key stakeholders and pursuing an agenda set around examining their data map and assessing areas of security vulnerability, as well as auditing the present state of compliance with regulations for the healthcare industry.

Editor’s note: This post was co-authored with Eric Heck, Healthcare Account Manager at Symantec.  Eric has over 25 years of experience in applying technology to emerging business challenges, and currently works with healthcare providers and hospitals to manage the evolving threat landscape of compliance, security, data loss and information governance within operational, regulatory and budgetary constraints.

The eDiscovery “Passport”: The First Step to Succeeding in International Legal Disputes

Monday, April 2nd, 2012

The increase in globalization continues to erase borders throughout the world economy. Organizations now routinely conduct business in countries that were previously unknown to their industry vertical.  The trend of global integration is certain to increase, with reports such as the Ernst & Young 2011 Global Economic Survey confirming that 74% of companies believe that globalization, particularly in emerging markets, is essential to their continued vitality.

Not surprisingly, this trend of global integration has also led to a corresponding increase in cross-border litigation. For example, parties to U.S. litigation are increasingly seeking discovery of electronically stored information (ESI) from other litigants and third parties located in Continental Europe and the United Kingdom. Since traditional methods under the Federal Rules of Civil Procedure (FRCP) may be unacceptable for discovering ESI in those forums, the question then becomes how such information can be obtained.

At this point, many clients and their counsel are unaware how to safely navigate these international waters. The short answer for how to address these issues for much of Europe would be to resort to the Hague Convention of March 18, 1970 on the Taking of Evidence Abroad in Civil or Commercial Matters (Hague Convention). Simply referring to the Hague Convention, however, would ignore the complexities of electronic discovery in Europe. Worse, it would sidestep the glaring knowledge gap that exists in the United States regarding the cultural differences distinguishing European litigation from American proceedings.

The ability to bridge this gap with an awareness of the discovery processes in Europe is essential. Understanding that process is similar to holding a valid passport for international travel. Just as a passport is required for travelers to successfully cross into foreign lands, an “eDiscovery Passport™” is likewise necessary for organizations to effectively conduct cross-border discovery.

The Playing Field for eDiscovery in Continental Europe

Litigation in Continental Europe and is culturally distinct from American court proceedings. “Discovery,” as it is known in the United States, does not exist in Europe. Interrogatories, categorical document requests and requests for admissions are simply unavailable as European discovery devices. Instead, European countries generally allow only a limited exchange of documents, with parties typically disclosing only that information that supports their claims.

The U.S. Court of Appeals for the Seventh Circuit recently commented on this key distinction between European and American discovery when it observed that “the German legal system . . . does not authorize discovery in the sense of Rule 26 of the Federal Rules of Civil Procedure.” The court went on to explain that “[a] party to a German lawsuit cannot demand categories of documents from his opponent. All he can demand are documents that he is able to identify specifically—individually, not by category.” Heraeus Kulzer GmbH v. Biomet, Inc., 633 F.3d 591, 596 (7th Cir. 2011).

Another key distinction to discovery in Continental Europe is the lack of rules or case law requiring the preservation of ESI or paper documents. This stands in sharp contrast to American jurisprudence, which typically requires organizations to preserve information as soon as they reasonably anticipate litigation. See, e.g., Micron Technology, Inc. v. Rambus Inc., 645 F.3d 1311, 1320 (Fed.Cir. 2011). In Europe, while an implied preservation duty could arise if a court ordered the disclosure of certain materials, the penalties for European non-compliance are typically not as severe as those issued by American courts.

Only the nations of the United Kingdom, from which American notions of litigation are derived, have discovery obligations that are more similar to those in the United States. For example, in the combined legal system of England and Wales, a party must disclose to the other side information adverse to its claims. Moreover, England and Wales also suggest that parties should take affirmative steps to prepare for disclosure. According to the High Court in Earles v Barclays Bank Plc [2009] EWHC 2500 (Mercantile) (08 October 2009), this includes having “an efficient and effective information management system in place to provide identification, preservation, collection, processing, review analysis and production of its ESI in the disclosure process in litigation and regulation.” For organizations looking to better address these issues, a strategic and intelligent information governance plan offers perhaps the best chance to do so.

Hostility to International Discovery Requests

Despite some similarities between the U.S. and the U.K., Europe as a whole retains a certain amount of cultural hostility to pre-trial discovery. Given this fact, it should come as no surprise that international eDiscovery requests made pursuant to the Hague Convention are frequently denied. Requests are often rejected because they are overly broad.  In addition, some countries such as Italy simply refuse to honor requests for pre-trial discovery from common law countries like the United States. Moreover, other countries like Austria are not signatories to the Hague Convention and will not accept requests made pursuant to that treaty. To obtain ESI from those countries, litigants must take their chances with the cumbersome and time-consuming process of submitting letters rogatory through the U.S. State Department. Finally, requests for information that seek email or other “personal information” (i.e., information that could be used to identify a person) must additionally satisfy a patchwork of strict European data protection rules.

Obtaining an eDiscovery Passport

This backdrop of complexity underscores the need for both lawyers and laymen to understand the basic principles governing eDisclosure in Europe. Such a task should not be seen as daunting. There are resources that provide straightforward answers to these issues at no cost to the end-user. For example, Symantec has just released a series of eDiscovery Passports™ that touch on the basic issues underlying disclosure and data privacy in the United Kingdom, France, Germany, Holland, Belgium, Austria, Switzerland, Italy and Spain. Organizations such as The Sedona Conference have also made available materials that provide significant detail on these issues, including its recently released International Principles on Discovery, Disclosure and Data Protection.

These resources can provide valuable information to clients and counsel alike and better prepare litigants for the challenges of pursuing legal rights across international boundaries. By so doing, organizations can moderate the effects of legal risk and more confidently pursue their globalization objectives.