Posts Tagged ‘discovery’

Apple, Code Name K48 and E-Discovery

Wednesday, June 22nd, 2011

According to a complaint filed by the U.S. government, the FBI secretly recorded an employee at one of Apple’s suppliers passing confidential information about the soon to be released Apple iPad in an October, 2009 telephone conversation.  The recording, along with other evidence, led to the arrest of the employee and others on charges on of wire fraud and conspiracy to commit securities fraud on December 16, 2010 as part of a major insider-trading investigation.  In the conversation, a director for Flextronics named Walter Shimoon is heard saying:

“they [Apple] have a code name for something new … It’s … It’s totally … It’s a new category altogether… It doesn’t have a camera, what I figured out. So I speculated that it’s probably a reader. … Something like that. Um, let me tell you, it’s a very secretive program … It’s called K, K48. That’s the internal name. So, you can get, at Apple you can get fired for saying K48.”

Four months later, the first Apple iPad, code named K48, was unveiled to the public.    To read more about the case background, read the press release issued by the U.S. Attorneys’ Office on December 16, 2010.

The case is interesting from an eDiscovery standpoint because it highlights challenges related to finding critical evidence as part of an investigation or lawsuit when people are intentionally using code words to hide information.  Finding or overlooking important documents that have been disguised can make or break your case, so determining whether or not key players are using code words is an important part of a thorough investigation.  Equally important to the investigation is segregating relevant and irrelevant documents quickly before key evidence is lost or destroyed without being required to conduct a painstaking page by page review of each document.

How Does Technology Help?

The good news is that even though technology innovation has resulted in massive data growth requiring the review and analysis of more documentary evidence during lawsuits and investigations, advances in eDiscovery technology have also made sifting through this information faster and easier.  In other words, technology can help solve the data growth problem technology created.

One of the newest advances is the use of “transparent concept search” technology to find important electronic files in lieu of basic “keyword” or “traditional” concept searching technology.  In many situations investigators or lawyers simply aren’t aware code words are being used to hide activity, so critical evidence is often overlooked.  For example, in the present case assume the investigator is unaware that “K48” is the internal code name used for the first iPad.  A simple keyword search for the term “iPad” may not retrieve critical documents about the “iPad” because the code name K48 is being used to disguise the product name.  If this is the only search methodology used, information could easily be overlooked during the investigation due to the limitations of simple keyword search technology.

On the other hand, running the same search using a traditional concept searching tool is likely to retrieve documents containing the word “iPad” as well as other conceptually related documents.  The problem is that the user has no ability to control the breadth of the search using traditional concept searching technology.  That means even though a traditional concept search for the term “iPad” is likely to include documents containing the term “K48” and “iPad,” it is also likely to retrieve a large number of irrelevant documents containing terms like “iPod, iTouch and iTunes that may appear to be conceptually related to the search term “iPad.”  The problem may seem trivial initially, but when investigators are required to read hundreds or thousands of irrelevant documents about the iPod, iTouch or iTunes in an effort to find relevant documents about the iPad, the time and cost of the investigation can skyrocket.

Next Generation Transparent Concept Search Technology

To solve this problem, next generation transparent concept search technology takes traditional concept searching a step further by empowering investigators to reap the advantages of traditional concept searching while actually reducing instead of increasing e-discovery expenses.  The secret is that transparent concept searching technology significantly reduces the time and expense resulting from over-inclusive document retrieval by allowing users to eliminate documents containing concepts that are not relevant to the intended search.  This is accomplished by providing a transparent view of concepts related to a search so that users can actually visualize and select (or deselect) the range of concepts to be included in a search before the search is executed.

For example, using transparent concept search technology to search for the term “iPad” would reveal conceptually related terms like “K48” just like traditional concept searching.  However, a transparent concept search would also provide a list of all concepts related to the keyword “iPad” prior to the search such as “K48, iPod, iTouch, Shimoon, iTunes, etc.  Prior to executing the search, the user could de-select irrelevant concepts and limit the search to “iPad”, “Shimoon”, “internal” and “K48” to make sure only the most relevant documents are retrieved. (See Figure 1).  In addition to decreasing the cost associated with segregating relevant and irrelevant documents, the transparent approach to concept searching results in strategic advantages for investigators and legal teams because the most relevant evidence is found quickly so cases can be assessed faster, with more accuracy, and before evidence disappears.

Figure 1: Transparent concept search reveals all concepts related to the keyword “iPad” so users can not only identify key documents they may have otherwise overlooked, but they can also select which concepts (“internal” “K48” “Shimoon”) to include in the search so only the most relevant documents are retrieved.

Conclusion

Not knowing what to search for as part of eDiscovery or investigations is often the biggest organizational challenge that basic keyword and traditional concept search technology has not been able to solve.  Next generation transparent concept search technology overcomes the inherent limitations of basic keyword and traditional concept searching technology by empowering users to uncover, assess, and review evidence faster and with more accuracy, thereby giving litigators or investigators new strategic advantages on every case.

E-Discovery Goes Mainstream

Tuesday, June 21st, 2011

These days, being mentioned on a late-night talk show is pretty much a stamp of “going mainstream”. This is true of celebrities (notably the One-Man Band that is Charlie Sheen), public figures (Captain “Sully” Sullenberger, who piloted the US Airways plane to a safe landing on the Hudson River), and even infomercial goods (who isn’t familiar by now with the Snuggie?)

In the e-discovery world, we realized just how mainstream this industry is becoming when we made mention on The Daily Show with Jon Stewart. With guest star Fareed Zakaria, fresh off the release of his new book, on set to discuss the American economy and the impact of technology on corporations, audiences were treated to this nugget:

Zakaria:   Machines can do things that people used to. There’s now computer programs that can do stuff that lawyers used to be able to do – discovery and things like that. May not be such a bad thing…

Stewart:   What can lawyers do that computers can’t do?

Lawyer jokes are never in short supply, and leave it to Jon Stewart not to miss a timely jab when one can be thrown. But we took notice because, of all the examples Zakaria could have used for technology’s impact on businesses everywhere — he chose to highlight the role of e-discovery software.

This was far from the first “mainstream” move for the e-discovery industry. In March, The New York Times published a featured – and top-emailed – article on advances in electronic discovery software. In May, leading analyst firm Gartner published the Magic Quadrant for E-Discovery Software, its first Magic Quadrant on the electronic discovery industry. And then in June, there it was: electronic discovery, right alongside CNN’s Fareed Zakaria and all Jon Stewart’s comedic antics on The Daily Show. Taken together, it’s clear that e-discovery is a hot topic on the minds of business folks and, increasingly, mainstream audiences. We’re eager to see where it comes up next – and secretly hoping the SNL sketch team is taking note.

Gartner Publishes First Magic Quadrant for E-Discovery Software

Friday, June 10th, 2011

Last month, Gartner published the 2011 Magic Quadrant for E-Discovery Software, its first ever Magic Quadrant (MQ) on the electronic discovery industry.

We believe the Gartner MQ signals e-discovery’s arrival as a major category of enterprise software, and creates a single, definitive “buyers’ guide” to help companies choose between the various solutions.  As the report points out, “The reason e-discovery is now a pressing issue for most companies is clear: ESI in all its many forms dominates legal proceedings because modern business is mostly conducted using electronic communications and electronic records. Regulators require this ESI to be archived for proof of compliance.”[1]

The authors of the report, Debra Logan and John Bace, are two of the industry’s leading lights. The report reflects their deep understanding of the domain and includes several keen insights into emerging trends and market dynamics.

Most software buyers are familiar with Gartner Magic Quadrants and the rigorous methodology behind them. In order to be included in the MQ, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon certain criteria for completeness of vision and ability to execute. In the Magic Quadrant for E-Discovery Software, Gartner states that, “Ease of use, intuitive user interfaces, attorney-focused workflow, advanced but transparent semantic analysis features, native file format review, and foreign language support are all considered desirable features from the end user’s point of view.”[2] According to the report, “A vendor’s ability and willingness to perform proofs of concept (POCs) is also important, and many references told us that, with certain vendors, “try before you buy” arrangements or POCs were so successful that they did not even open their tendering process to competitive bidding.”[3]

In total, the Gartner Magic Quadrant for E-Discovery Software report analyzes 24 different e-discovery software vendors, and is meant to help CIOs, general counsel, IT professionals, lawyers, compliance staff and legal service providersunderstand the dynamics and landscape of the e-discovery software market. Combined with its analysis of the factors driving the growth of e-discovery and its vendor-by-vendor evaluation, we believe this makes the report a must-read for anyone involved in selecting an e-discovery solution.

For a limited time, please register here to download a complimentary copy of the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
The Magic Quadrant is copyrighted 2011 by Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. It depicts Gartner’s analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the “Leaders” quadrant. The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.


[1] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 5.

[2] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 8.

[3] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 9.

#Winning the Battle with Social Media and Electronic Discovery

Wednesday, May 25th, 2011

It seems all too easy to poke fun at Charlie Sheen’s antics of late.  And, while they serve as cautionary tales in numerous contexts, his use of social media to launch his “tiger blood” fueled rampage against his former employer may mean that these rants may actually turn into evidence someday soon in his breach of contract action.  On one hand, his public meltdown was surely a high water mark for social media as a window into the real-time (can’t look away) train wreck that is now Mr. Sheen’s career.  After all, he now has over 3 million Twitter followers and for those who don’t expressly follow his now infamous rants (e.g.,“#winning”) other media outlets stand by to repost and re-tweet every scintillating (less than 140 character) proclamation.

For those who think that they’d prefer to have less Sheen in their daily diets, let’s use his 15 minutes of über-fame to examine the impact of social media on the traditionally email oriented electronic discovery process we’ve all come to know and love.  On balance, while the electronic discovery and regulatory issues are all fundamentally the same, the social media genre does genuinely pose a range of tactical and strategic challenges.

Accept Reality and Plan Accordingly

For many organizations, it’s easy to exhale as they’ve finally reigned in some of the email chaos during the 2000s.  But, this small victory in the larger information management war has been eclipsed by new challenges posed by social media.  The problem isn’t just that the types (Twitter, LinkedIn, Facebook, Flickr, YouTube, etc.) are increasing at a mind numbing speed, but the volumes of accumulated data (1 billion tweets per week) is also proliferating wildly.  A recent article published under the Sedona Conference’s aegis, The Impact of the Internet and Social Media on Records and Information Management: Unexpected Bedfellows Highlight the Need for Effective Information Management –Now More than Ever points out:

“Most commentators agree that, if social networks in the workplace are inevitable, corporations must resign themselves to the inevitable and prepare accordingly. …

To combat these risks, companies must understand the interactions between IT and legal and how they intersect in the world of Records and Information Management. Companies must also employ a cross-discipline approach to issues in order to properly address access rights, digital security, Records and Information Management policies and programs, short and long-term data storage strategies, audit processes, enforcement protocols, incident and litigation response plans, and employee education programs.”

The authors correctly state: “[h]ard decisions need to be made, and resources committed, but an ounce of current prevention now may certainly outweigh the inevitable pound of information loss.”  This “pound” is most often paid out by organizations in response to electronic discovery costs, which are axiomatically linked to the amount of electronically stored information (ESI) a company needs to collect, process, review and produce for the matter at hand.

So, like the first stage in the Kübler-Ross grief process (denial), organizations need to accept the reality of social media, understand the implications (cost, risk, information security challenges, etc.) and start to plan accordingly.  Yet, the denial seems to be ever-present.  In a recent survey by the Electronic Discovery Reference Model (EDRM) it was shockingly noted that “[w]ritten policies for social media are non-existent,” with 85 percent of industry professionals admitting that “no written policies existed within their organizations regarding the preservation of data for any of the wildly popular social networking sites.”

Deploy the Right Information Governance Policies

Step one in this challenging task is to reign in the authorized/unauthorized use of social media.  This likely doesn’t equate to the wholesale prohibition of all social media (which would be nearly impossible to enforce).  Instead, the goal should be to define what is permissible via policies and procedures, and that in turn should be used to identify expectations of reasonable corporate conduct.  This effort should inherently recognize that there are legitimate business purposes (ideally with defined corporate objectives), as well as individualized usages (that nevertheless may still be supportive of company goals) for the use of social media.  In all instances it will be important to calculate the benefits of the social media activity (increased collaboration, real-time communication, ubiquity, etc.) with the risks (lack of information control, reduced productivity, etc.).

Organizations may also consider which of their own social media forays constitute a business record and whether or not they should be proactively archiving/capturing/preserving such ESI to create a robust document trail if social media ever becomes a factor in litigation.  Unfortunately, as with a number of other quickly developing paradigms such as cloud computing, there’s often a “ready, fire, aim” approach where the legal, risk and compliance ramifications aren’t well thought out prior to deployment.

Social Media Use Cases Proliferate

In just the past few months there have been numerous instances where social media has taken center stage in the electronic discovery and compliance realms.  Here are just a few recent examples:

  • Facebook – In what’s been called the Facebook firing case, the National Labor Relations Board (NLRB) jumped in to chastise an employer for allegedly firing an employee due to a Facebook post (where she allegedly called her boss a “mental patient”).  The NLRB said that policy was in violation of the National Labor Relations Act, which gives employees the right to discuss “the terms and conditions of their employment with others.”
  • LinkedIn – In a recent breach of contract action, an employer claimed it had evidence of improper solicitation of its employees through the LinkedIn connections of one of the defendants.

Electronic Discovery Costs Rise in Concert with Social Media

While “ESI is ESI” to some extent, there are tactical challenges with conducting electronic discovery of social media.  Using the EDRM model as a guide, the first main hurdle will be Identification.  In this stage, the challenge will be to identify (ideally through data maps and corporate policies) the instances of social media that are being legitimately used within an enterprise.  This process is typically linked to key players, versus searching the entire data universe for specific key terms.

Once this thorny issue is confronted, the even bigger challenge involves determining how to manage the risk-laden Preservation process.  Here, the highly dynamic nature of social media stands in contrast with its email counterpart, which in many ways was designed to be used in a linear, threaded format.   The preservation of social media has two main challenges.  The first one is access since the information may be private or semi-private.  In this setting, the recent case of Romano v. Steelcase Inc. (N.Y. Sup. Ct. 2010) shed light on some of the privacy issues:

“[W]hen Plaintiff created her Facebook and MySpace accounts, she consented to the fact that her personal information would be shared with others, notwithstanding her privacy settings. … Since Plaintiff knew that her information may become publicly available, she cannot now claim that she had a reasonable expectation of privacy. As recently set forth by commentators regarding privacy and social networking sites, given the millions of users, ‘[i]n this environment, privacy is no longer grounded in reasonable expectations, but rather in some theoretical protocol better known as wishful thinking.’”

Assuming the privacy issue can be overcome, the next issue is encountered during the Collection phase, since it will be hard to actually preserve social media in situ because the content is dynamic and at the whim of the user and the applicable site.  Aside from a few recent developments like Facebook’s “Download Your Information” feature and the library of Congress’ decision to archive all tweets since March, 2006, most social media ESI is fleeting and hard to collect.  The current brute force methodology is to simply take screenshots of the relevant blog, tweet or status update.  The challenge with this approach isn’t necessarily with admissibility (see below).  Instead, this static modus operandi makes it nearly impossible to utilize analytical tools to search and review the social media content.  Assuming the use cases and volumes continue to proliferate at current speeds (which is a safe bet), then leveraging analytical tools becomes that much more important in the effort to control costs.

Over time, enterprises will attempt to funnel users into corporate social media platforms that have governance abilities built in, particularly export functionality that will permit more advanced search and analytics from purpose-built e-discovery applications.  But, for now, the herding cats exercise will continue as the corporate community tries to keep up with the faster moving user base.

Admissibility is an Open Question

Not to be forgotten, the core underpinning to a successful collection of social media content is the ESI’s admissibility, since the ability to use data in court is the sine qua non for conducting the electronic discovery process in the first place.  In the seminal case on the admissibility topic, Lorraine v. Markel Am. Ins. Co., (2007), United States Magistrate Judge Paul W. Grimm noted that “[v]ery little has been written, however, about what is required to insure that ESI obtained during discovery is admissible into evidence at trial.”  He went on to note that “[t]his is unfortunate, because considering the significant costs associated with discovery of ESI, it makes little sense to go to all the bother and expense to get electronic information only to have it excluded from evidence or rejected from consideration during summary judgment because the proponent cannot lay a sufficient foundation to get it admitted.”

While it’s difficult to quickly summarize the authentication requirements for social media, it is an important aspect that can be achieved a number of ways (pursuant to FRCP 901, which only provides illustrations).  The most likely course would be to utilize the “personal knowledge” approach, which permits authentication by testimony “that a matter is what it is claimed to be.”   Other approaches, such as utilizing metadata (which works well for more static ESI) will be less applicable for social media since it is more ephemeral.

Conclusion

Many have proclaimed that social media will soon render email obsolete as the primary form of corporate communication.  While this remains to be seen, social media is quickly elbowing its way into the electronic discovery conversation and this new kid on the block needs to be taken seriously, or the unwary practitioner may unwittingly find out (via tweet) that their case has been dismissed because of spoliation.

Clearwell Signs Agreement To Be Acquired By Symantec

Thursday, May 19th, 2011

I am thrilled to announce that Clearwell has signed an agreement to be acquired by Symantec for $410 million ($390 million, net of our cash balance of $20 million). By bringing together Clearwell’s market leading e-discovery platform with Symantec’s market-leading archiving solution, we are uniquely positioned to provide customers with the next generation of information management solutions.

The e-discovery software industry has matured rapidly in the 6 years since Clearwell was founded. As electronic information has become a key part of all litigation, regulatory inquiries, and internal investigations, companies have had no choice but to adopt e-discovery software to keep their costs down. Some have done so by bringing e-discovery in-house; others prefer to work with law firms and litigation support companies who provide cloud-based solutions. Either way, e-discovery software has become widely adopted by corporations, government agencies, and law firms around the world.

Clearwell has been a major beneficiary of these trends. Our annual sales have grown rapidly to over $50 million, and the company has been profitable since 2009. Today, we have over 400 customers and 75 partners in 14 different countries.

Many of these customers are using Clearwell together with Symantec Enterprise Vault in a single integrated workflow, and they have often requested that we couple our products more tightly to better serve their information management needs. That’s what led us to partner with Symantec for the past several years and ultimately led to this transaction. Over time, we see corporations and government agencies increasingly seeking information management solutions that encompass both e-discovery and archiving, making the combination of Clearwell with Enterprise Vault incredibly compelling.

In the near term, we expect very little to change for our existing customers. The product will continue to be sold on a standalone basis and supported by the Clearwell team. We remain committed to serving law firms and litigation support partners, who are absolutely critical to our success in more ways than we can describe.

This is an exciting time for the e-discovery industry. Last week, Gartner published its first ever Magic Quadrant For eDiscovery Software. Today, Symantec and Clearwell join forces to deliver a seamless, integrated archiving and e-discovery management workflow, benefitting all our customers. You can find more information about the acquisition at: http://www.symantec.com/clearwell.  There are exciting times ahead.

IBM’s Watson: Can It Be Used for E-Discovery?

Thursday, May 12th, 2011

As the buzz around Watson and its foray into human-like (actually super-human) performance subsides, it may be time to take stock of what all the fuss was about. After all, we’re all used to computers doing better than humans in many things and even take its superior store of knowledge for granted. And, on the surface, we get answers to questions on pretty much anything from a simple Google or Bing search. So, what really is the big deal and is it even relevant in the context of electronic discovery?

For those not clued in on this, Watson is a brainchild of a four-year effort from 20-25 researchers at IBM, to build a computing engine that would successfully compete at champions-level at the popular quiz show, Jeopardy. Although it blundered on a couple of answers, it competed very well, with a wide margin of victory. Several industry experts that learned of it and watched the show have lauded this as an accomplishment at the same scale or even better than the IBM Deep Blue beating Chess Grand Champion, Gary Kasparov, in 1997. So, let’s examine if this is indeed worthy of the accolades it has gotten.

Behind Watson is an impressive piece of hardware – a series of 90 IBM Power 750 nodes, adding to 16TB of memory and 2,880 Power7 processor cores delivering a staggering 80 teraflops of peak performance.  All the hardware is highly inter-connected with ability to work on problems in parallel, but still marching to a final result in three seconds or less – just fast enough to beat the human buzzer. Some highlights of the computing infrastructure from the hardware architect, Dr. James Fan, at IBM indicate that the three-second timeframe meant the entire corpus of 200 million pages was loaded into memory. Also, with several processors simultaneously working on pieces of the problem, they place very high I/O requirements. The hardware supports a multi-processing OS, with virtualization, in a workload optimized system. The software drives the hardware using thousands of dense threads, with each thread of execution processing a large chunk of work with minimal context switch. Also, given the large number of cores, each thread is optimally allocated to a core. Branded as DeepQA, the software executes a series of complex algorithms in order to solve a very specific problem: winning on Jeopardy.

First, the Jeopardy game provides categories of clues. Some categories help in understanding the clue, while others are simply misleading to a computer. Next, the clue is revealed and one needs to determine what the clue is really asking, since many clues do not ask for a factoid with a direct question, but rather is a composition of multiple sub-clues, each related to another in some linguistic, semantic, syntactic, temporal or other forms of connection. The decomposition of clues and figuring the relationships is a challenge even for humans. Finally, after one understands the clue, you then have to hone in on an answer with some level of confidence, within a three-second window, and must activate the answer buzzer ahead of the rest of the competitors. Besides individual clues, one has to also devise an overall game strategy for selecting the next category, selecting a clue within that category, how much to wager on the Double Jeopardy and the Final Jeopardy. Overall, the game is a complex amalgamation of knowledge, language analysis, gaming strategy and speed of recall of answers.

The software architecture of the DeepQA system is documented in a paper published in AI Magazine. The team built several components to address each area of the problem, with many independent algorithms in each component.  There are lots of complicated technical details, but the final outcome is a human-like response.

A question on that anyone who examines its inner workings has is whether the system is really natural language processing, or statistical language analysis, or machine learning or some sort of ad-hoc program, which doesn’t fit any traditional area of analytics. It does appear to be an combination of several techniques, which may mirror exactly how humans go about solving these clues. We seem to have a large collection of knowledge, initially unconnected but the category, the clue, the hypothesis all appear to generate word and concept associations and a fuzzy evaluation of confidence measures which all converge into a confidence with which a competitor answers a question. It is the replication of these processes by algorithms that makes it truly an astounding achievement.

Given the success of DeepQA’s performance, a natural question is whether it has any practical value for helping us solve day-to-day problems. More specifically, can it cope with the information overload and the challenges of e-discovery posed by that mass of information?  Use within e-discovery context has been explored by several authors, most notably, Robert C. Weber of IBM and Nick Brestoff in recent Law.com articles. Their analysis is based on the ability to explore vast volumes of knowledge. But really, what DeepQA tackled is something more significant – the inherent ambiguity in human spoken and written communication. Our natural instincts are to employ subtle nuances, indirect references, implicit assumptions, and incomplete sentences. We tend to leverage prior and surrounding context in most of our communications. It’s just the natural way of communications, since doing so is actually very effective. We assume establishing context is redundant, unproductive and unnecessary as it tends to make communication repetitive. By not employing a rigid structure in how we write, we are able to bring to bear concise exchanges that span a large volume of information.

If the last two decades is an indicator, the nature of communication is getting less formal, with emails, instant messages, tweets, and blog posts replacing well-crafted formal letters and memos. And, forcing individuals to communicate using rigid, unambiguous text in order for it to be processed by computers easily would mean a huge change in behavior in how people communicate. Any action that contemplates such a change in behavior across billions of people is simply not going to occur. What this means is that the burden for automated analysis using computing algorithms is even greater. This is what makes the discovery of relevant content in the context of e-discovery a very hard problem, one that is worthy of the sort of technological prowess employed by DeepQA team.

Given that our appetite for producing information is ever-increasing, while its discoverability is getting harder, taking the work of DeepQA and adapting it to solve e-discovery needs has the potential to make significant improvements in how we tackle the search, review and analytical aspects of e-discovery.  DeepQA took an easily articulated goal of answering at least 60% of the clues with 85% precision in order to reach champion levels. That was sufficient to win the game. Note that there was never an attempt to get 100% of all clues, with 100% confidence. In the realm of e-discovery, we would be looking at taking a very general production request such as the TREC 2009 Topic 201 “All documents or communications that describe, discuss, refer to, report on, or relate to the Company’s engagement in structured commodity transactions known as prepay transactions.” and use just such a simple articulation of the request to produce relevant documents. It is the core algorithms of machine learning, multiple scoring methods, managing relevance and confidence levels along with traditional information retrieval methods that form the ingredients of the new frontier of automated e-discovery. Beyond e-discovery, application of DeepQA algorithms for business analytics also has significant potential, where fact and evidence-based decision making using unstructured data is likely the norm. DeepQA’s very public Jeopardy challenge has shown that the ingredients needed for enabling such problem solving is well within the realm of possibility.

Electronic Discovery Cases You Must Know

Tuesday, May 10th, 2011

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

  • to issue a written litigation hold;
  • to identify all of the key players and to ensure that their electronic and paper records are preserved;
  • to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;
  • and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent production). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

· to issue a written litigation hold;

· to identify all of the key players and to ensure that their electronic and paper records are preserved;

· to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;

· and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent p

I was at Sedona midyear meeting last week and during Ken Withers’ excellent discussion of recent e-discovery case law, a few thoughts occurred to me. First, there are so many cases coming out now each week it’s hard to stay above the fray and mine for useful nuggets. The task is a bit Sisyphean, so folks like Ken (who keep a rolling index of cases) are particularly helpful. Next, I was struck by how hot Pension Committee still is, even after almost a year and a half. Certainly, this ongoing spotlight wasn’t an accident, and it’s almost certain that Judge Scheindlin is pleased by the ongoing debate.

I frequently get questions from enterprise clients regarding which cases they should know about, and so I put together an EDRM oriented (left to right) list for folks who just can’t get to all the latest cases. While it’s not an annual roundup per se, I do think it’s a bit more functional for busy electronic discovery professionals who need to stay current. So, here’s the buzz index of cases arranged by topic:

Preservation: The Legal Hold Gold Standard

Case: Pension Committee of the Univ. of Montreal Pension Plan, et al., v. Banc of America Securities, LLC, et al. (S.D.N.Y. 2010).

Summary: The dispute focused on claims by a group of investors who brought an action to recover losses of $550 million dollars stemming from the liquidation of two British Virgin Islands based hedge funds. Unlike many typical e-discovery disputes, this instant action focused on the conduct of the plaintiffs as they attempted to deal with the often murky landscape of electronically stored information (ESI) preservation, collection and production. Judge Scheindlin goes out of her way to crystallize duties and identify the type of conduct that can cause an e-discovery breach. “After a discovery duty is well established, the failure to adhere to contemporary standards can be considered gross negligence. Thus, after the final relevant Zubulake opinion in July, 2004, the following failures support a finding of gross negligence, when the duty to preserve has attached:

  • to issue a written litigation hold;
  • to identify all of the key players and to ensure that their electronic and paper records are preserved;
  • to cease the deletion of email or to preserve the records of former employees that are in a party’s possession, custody, or control;
  • and to preserve backup tapes when they are the sole source of relevant information or when they relate to key players, if the relevant information maintained by those players is not obtainable from readily accessible sources.”

Why it’s (still) important: First of all, Pension Committee is written by Judge Scheindlin, who is the most famous electronic discovery jurist on the planet. Next, since she’s in the Southern District of New York, it means that folks even in other jurisdiction that aren’t bound by her opinions still must take heed given the fact that New York is home to so many multinational organizations. Finally, her opinion is the clearest (even if disputed) articulation regarding the standard of care for the issuance of legal holds and the duty to preserve ESI. She attempts to categorically define conduct that is grossly negligent and therefore susceptible to extreme sanctions, including spoliation inferences and terminating sanctions. Fortunately, she recognizes the numerous challenges associated with electronic discovery. And, so as to blend in a healthy dose of reality Judge Scheindlin also said: “In an era where vast amounts of electronic information is available for review, discovery in certain cases has become increasingly complex and expensive. Courts cannot and do not expect that any party can meet a standard of perfection.”

In the end, Pension Committee, was the case of the year in 2010 and even in 2011 it’s generating an unprecedented level of retrospectives (here and here). It may be because Judge Scheindlin’s relatively bright line standard has created so much debate, but in the end the Pension Committee discussion will likely continue for the foreseeable future (perhaps only ending when/if the culpability rules are amended to create a unified national standard).

Preservation: Why Preserve in Place is Risky?

Case: Wilson v. Thorn Energy, LLC, (S.D.N.Y. 2010).

Summary: In Wilson, the defendant corporation identified a flash drive that contained relevant ESI, but rather than copying that data safely to a centralized evidence repository, the defendant’s employee chose to hold on to the drive, putting it instead into a desk drawer. When the files were requested for review and production, the files could not be read from the drive. The defendant’s employee attempted to recover the ESI contained on it, but those efforts failed. Granting plaintiffs’ motion for sanctions, the court ordered that defendants would be precluded from offering evidence at trial concerning the data contained on the discarded drive.

Why it’s important: In today’s e-discovery world, many organizations are instituting hold processes via manual solutions and then waiting weeks or months to ultimately collect the ESI. Wilson shows the danger of simply preserving data and makes the argument that you should either “collect to preserve” or collect very shortly after the litigation hold notice goes out. While focusing on a certain media type (flash drive), this analysis can be extended to any digital system containing ESI that inherently has some set failure rates or can be imagined to fail without express, conscious action (due to loss, theft, recycling, etc.).

Identification & Collection: “Manual” Collections Come Under Fire

Case: Green v. Blitz U.S.A. (E.D. Tex. Mar. 1, 2011)

Summary: In this case, Plaintiff sought to re-open her lawsuit despite prior settlement upon learning that defendant had failed to produce relevant documents. Finding that defendant had committed discovery abuses, including failing to disclose relevant evidence and failing to issue a litigation hold, the court ordered defendant to pay plaintiff $250,000, to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years and to file the court’s order in every case that defendant is involved in for the next 5 years. It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.

Why it’s important: Green is the latest in a line of cases [See also Ford Motor Co. v. Edgewood Properties Inc., 257 F.R.D. 418 (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., 621 F. Supp. 2d 1173 (D. Utah 2009) ] that have been highly critical of manual (or self) collection efforts by the individual custodians. Historically, if the custodians were monitored/supervised enough by counsel, this manual collection process was largely deemed defensible, but it looks like this behavior is simply too risky for any conservative enterprise. The better practice is to leverage the custodians to point out where relevant ESI might exist and utilize software tools to conduct broad collections from key players. While it’s not necessary to use IT tools to collect data immediately for all custodians who have received a litigation hold notice, it’s probably unreasonable to not quickly collect ESI (via formal, IT based methods) from at least some subset of key players. The main point is that this isn’t an all or nothing calculation. Costs, risks and benefits should all be carefully evaluated and documented, in case there’s a downstream challenge.

Analysis & Review: Failure to Test Keywords and Sample

Case: Mt. Hawley Ins. Co. v. Felman Prod., Inc., (S.D. W. Va., 2010).

Summary: In this case the court examined the reasonableness of plaintiff’s precautions to prevent disclosure of email, which was inadvertently produced by the plaintiff amidst “a massive disclosure of e-discovery.” The Mt. Hawley court applied the five-factor test established in Victor Stanley, Inc. v. Creative Pipe, Inc. (D. Md. 2008) and found that the producing party had not taken reasonable steps during discovery. In particular, the court was unwilling to find that the inadvertent production of 377 privileged documents was “solely attributable” to a technological glitch and instead found that plaintiff and counsel “failed to perform critical quality control sampling to determine whether their production was appropriate and neither over inclusive nor under-inclusive.” This finding meant that their attorney client privilege was waived as to the subject documents.

Why it’s important: Mt. Hawley demonstrates why sampling and keyword search term formulation is critically important to any defensible discovery effort. In many instances where “blind” keyword strategies are used, the producing party is taking on an undue risk, in essence flirting with the “3rd rail” of electronic discovery (inadvertent production). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

roduction). Blind keyword searching (followed by brute force review and production) is sadly still a very common practice today. My hope is that cases like Mt. Hawley will force the blissfully ignorant practicioners to take stock of their risky practices and get with contemporary best practices like ECA, sampling, iterative search and the like.

Conclusion

Simply by creating such a list, I’m sure to leave off cases other folks think are more buzz worthy. But, for me, having a few good legal chestnuts is better than trying to boil the ocean and synthesize all the available case law. If you have any comments I’d be eager to hear (good, bad or indifferent).

Self Collections in E-Discovery – Just too Risky for Prime Time

Wednesday, April 20th, 2011

In past blogs I’ve discussed a number of cases that have expressed skepticism over the self collection of electronically stored information (ESI) in the electronic discovery process.  In many of these cases, the reviewing judge or magistrate has looked at this process with an increasingly jaundiced eye, in some cases using the self collection component as part of its rationale for sanctions.

My conclusion up until now has been that this collection methodology (where employees manually select and potentially harvest their own data) could be defensible if properly executed; meaning with the requisite level of attorney guidance and oversight.  And, while this is still technically accurate, I think the pendulum has swung far enough to proclaim that this approach is simply far too dangerous for most enterprises, except perhaps those that are extremely risk tolerant.

While there was no particular straw that broke the camel’s back, the trend in the case law now seems to be moving inextricably in one direction – i.e., that self (or manual) collection is no longer safe enough for average enterprises.  Just like tight rope walking without a safety net, self collection protocols aren’t inherently doomed to failure, but there isn’t much (probably any) margin for error.

In the recent case of Green v. Blitz U.S.A., (E.D. Tex. Mar. 1, 2011) we see yet another example of self collections gone awry.  In Green the Plaintiff sought to re-open her lawsuit despite a prior settlement, once she suspected that the defendant had failed to produce relevant ESI.  Finding that defendant had committed numerous discovery abuses, including not disclosing relevant evidence and failing to properly issue a litigation hold, the court put the hammer down, issuing a wide range of sanctions:

  • Defendant had to pay plaintiff $250,000
  • Defendant had to provide a copy of the court’s order to plaintiffs “in every lawsuit proceeding against it” for the past two years
  • Defendant had to file the court’s order in every case that it is involved in for the next 5 years.

Self collection was again a material culprit in the culpable behavior.  It was revealed that the employee “solely responsible for searching for and collecting documents relevant to litigation” issued no litigation hold, conducted no electronic word searches for emails, and made no effort to speak with defendant’s IT department regarding how to search for electronic documents.  To exacerbate matters, the main individual in charge of collections was also closely tied to the research and development of the “flame arresters” that were at issue in this exploding gas can case.

Adding fuel to the fire (pun intended) was the fact that the responding company failed to locate or collect emails with the search term “flame arrester.”  The court went on to note that some of the smoking gun emails not only contained this “most obvious term to search for in electronic documents in this case”, but in fact “flame arrester” was used in the title of certain emails.

While folks have called this practice an example of a “fox guarding the henhouse,” in my mind it’s less that custodial bias renders self collection too risky for prime time.  While there are certainly examples like the Green case where bias likely was an issue, the bigger problem is that any significant reliance on custodians to direct a collection process (even a well supervised one) has too many failure points.  The most obvious (and innocent) scenario is where custodians simply can’t remember if they had responsive ESI or where such information might reside.  This problem can be particularly acute given the fact that litigation is almost always conducted in the rear view mirror.  Since I often can’t remember what I had for breakfast I don’t find it surprising that a custodian might not recall if they had data relating to a discrete issue 4-5 years ago.

As such, the contemporary “best practice” for the collection of ESI is quickly evolving past the old manual collection workflow.  Technology is rapidly making it quick and painless to conduct searches for blatantly relevant ESI (like emails with “flame arrester” in the title).  Not only can you conduct basic searches with existing technologies, but recent advancements around concept search and other analytical tools makes the failure to leverage these technologies seem that much more unreasonable.  For example, in the recent case of Northington v. H&M Int., (N.D. Ill. Jan. 12, 2011) the defendant was sanctioned, in part, because they didn’t search for minor misspellings of the plaintiff’s name.

When a court sees manual blunders like that in the Green case it’s not surprising to see such missteps cast as at least negligent and perhaps even more culpable.  This conclusion is made even easier when organizations like the Sedona Conference (in its Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery) state:  “[i]n many settings involving electronically stored information, reliance solely on a manual search process for the purpose of finding responsive documents may be infeasible or unwarranted. In such cases, the use of automated search methods should be viewed as reasonable, valuable, and even necessary.”

Green is now the latest in a line of cases [See also, Ford Motor Co. v. Edgewood Properties Inc., (D.N.J. 2009) and Phillip M. Adams & Assoc., LLC v. Dell, Inc., (D. Utah 2009)] that have been highly critical of self collection efforts by individual custodians.  The better practice is to utilize technology to conduct collections from key players and perhaps leverage the custodians (and technology) to point out where relevant ESI might exist.  As such, a belt and suspenders approach is undoubtedly the safer way to go.  In this “dual protection” scenario “key custodians still search, identify, and self-collect what they think are relevant emails, but, as a fail safe, IT also collects all of the key custodians’ emails. Then attorneys search and identify relevant documents from this full, uncensored, unfiltered, collection. This double effort guards against the intentional and unintentional mistakes that can sometimes arise in self-collection.”

We know that electronic discovery is never going to be a perfect process, but self collections simply inject too much risk into an already complicated process.  Now is the time to change tactics and stop tight rope walking without a safety net.  After all, no enterprise wants to be the next to endure a highly publicized fall.

Clearwell, NDLON v. ICE, and the Pavlik-Keenan Declaration

Friday, April 15th, 2011

On February 21, 2011, U.S. Immigration and Customs Enforcement (ICE) filed a declaration as part of its ongoing case with the National Day Laborer Organizing Network (NDLON). That declaration, written by Ms. Pavlik-Keenan, made references to Clearwell that have since been selectively quoted and commented on by several of our competitors. Essentially, these competitors have tried to exploit the declaration and use it in a way which was not intended by ICE.

We realized on reading the Pavlik-Keenan Declaration (PK Declaration) that it contained many mis-statements. Until now, we have not responded publicly because this is an ongoing legal matter involving one of our customers, and we do not want to weaken ICE’s stated position. We also felt that, rather than speak out ourselves, it was more appropriate to approach ICE and ask for its help in correcting the public record. That process is ongoing, but there are now new documents from ICE in the public domain which correct some of the mis-statements in the PK Declaration. This blog post is based on information from these new documents.

When reading the PK Declaration, there are 4 important considerations to keep in mind:

#1: NDLON vs. ICE is a precedent-setting case

This is bleeding-edge stuff. The issue here is whether metadata is part of the public record and therefore discoverable under the Freedom of Information Act (FOIA). Judge Scheindlin says it is; ICE disagrees and is opposing her order to produce it. This is a grey area because the standards for FOIA are quite different from those for electronic discovery. Unlike with litigation, FOIA requests are governed by 9 exemptions which are designed to protect private information from being released to the public, and not by the FRCP. Should government agencies now be required to provide metadata, then they must redact that metadata to remove information covered by these 9 exemptions, which (according to the government) is difficult and expensive to do. To our knowledge, it’s also something that has hardly ever been done before, because people generally don’t redact metadata in a load file. So, prior to Clearwell’s use in this case, e-discovery products have typically not been used in this way.

#2: A declaration is an advocacy document, not a ruling from a judge.

This means the PK Declaration is designed to argue a point of view based on personal opinion, not be a statement of fact or legal conclusion. The stakes for the government in this matter are very high, since there are potentially thousands of FOIA requests which could be impacted by Judge Scheindlin’s ruling. So ICE has every incentive to argue forcefully that, whatever technology is used, the resources needed make it excessively expensive to comply with the court’s order. It just so happens that the technology used in this particular case is Clearwell.

#3: The PK Declaration is factually incorrect in several important areas.

There are many statements in the PK Declaration which are – quite simply – not true. To give 2 specific examples from the document:

A. Claim in PK Declaration: 11. … OPLA [Office of the Principal Legal Advisor] estimates that it was forced to expend more than $270,000.00 in upgrades, including the acquisition of a new $32,000.00 server, during this period in order to have access to and run the application.

A. Fact: Neither OPLA nor any other part of ICE paid a dime for upgrades or a new server. In reality, its use of the product for this matter is covered under ICE’s existing license, and we provided an extra server and services for free to help them meet a tight deadline. Clearwell’s “re-usable” license is specifically designed to allow customers to deal with unexpected cases at no incremental software cost, which is what happened here.

B. Claim in PK Declaration: 19. … ICE has been advised by the software vendor that ICE’s software, as it currently exists, cannot produce a “load file” that is compatible with Concordance 8X and/or Opticon 3X.

B. Fact: There are many customers using Clearwell today to produce “load files” for any of several industry leading formats in its export/production options, including Concordance, Summation, EDRM XML and Opticon. Clearwell also offers “Configurable Templates” to produce in any form that is requested.

To address these inaccuracies, on April 11, 2011, the US Attorneys’ Office took the highly unusual step of filing a supplementary declaration to (in their words) “clarify” its earlier statements. In the newly filed Document 86, Declaration of Ryan Law, it states:

6. … “the $270,000.00, which includes $32,000.00 for acquisition of a new server, has not yet been spent. … Clearwell loaned a new server to ICE for the duration of the January 17, 2011 production.”

7. … “ICE was not stating that Clearwell does not allow for the production of such a load file, just that ICE cannot do it with its current software.” All that’s required is a configuration file.

In addition to specific statements, such as those listed above which are now being clarified, the PK Declaration also makes broad generalizations about Clearwell which are untrue. ICE has released new information to address the most widely circulated of these:

C. Claim in PK Declaration: 14. … the agency should abandon the Clearwell application and discontinue its use.

C. Fact: ICE is a happy Clearwell customer who regularly takes reference calls on our behalf from other Federal agencies. As evidence of this, on April 8, 2011 (i.e., 6 weeks after the PK Declaration was filed), the contracting officer at ICE issued a letter (referenced in Document 86) which states that Clearwell “meets the government’s needs and performed in a satisfactory manner. As a result…ICE exercised the option (effective September 23, 2010) to extend the term of the contract through September 22, 2011.” You can read more about this in today’s press release.

#4: The PK Declaration misrepresents how Clearwell is being used at ICE.

ICE purchased Clearwell in 2009 for use by OPLA (Office of Principal Legal Advisor) on civil litigation matters. FOIA requests are handled by a different department with whom we had no contact until, without our knowledge, ICE FOIA decided to borrow Clearwell from ICE OPLA to respond to the FOIA request from NDLON in December 2010. In 16 working days, Clearwell was used to process a large volume of information and produce nearly 15,000 pages of Opt-Out Records (Document 79, Filed 3/30/11, Declaration by Sarahi Uribe). To help ICE meet its deadline, two Clearwell consultants worked onsite during this period – at absolutely no cost to ICE.

Much of this is recounted in the first Declaration of Ryan Law (Document 68, Filed 3/23/11), in the section entitled “Description of Events that Led to the Use of Clearwell”. I draw from that document below:

21: I [Ryan Day] consulted with other program offices within ICE and determined that using the e-discovery platform “Clearwell”, which is owned by the ICE Office of the Principal Counsel (“OPLA”), would offer the best chance for the Agency to meet its court-ordered disclosure requirement.

24: Clearwell was not obtained by ICE for FOIA applications. During the three-year application procurement and development process, OPLA did not take FOIA needs into consideration in determining the relevant capabilities the application would require.

26: At the time of the Court’s December 9, 2010 order, the Clearwell application was untested and was not yet approved for use. At that time, OPLA was in the beginning stages of establishing protocols for the use of the software and had originally anticipated a pilot phase of testing to begin in February or March 2011.

27: ICE OPLA was able to deploy Clearwell on December 20, 2011 for use by the FOIA Office on a provisional basis specifically to meet the January 17, 2011 deadline imposed for the production of records responsive to the Plaintiff’s request for “opt-out” records.

***

There is still a lot that we cannot say publicly about the PK Declaration, out of respect for ICE (our customer) who’s engaged in active litigation. But we would be happy to provide further information to concerned parties under NDA.

Clearwell’s Use In The Matter of Datel v Microsoft

Monday, April 4th, 2011

It’s widely known that Microsoft is a Clearwell customer, and uses our product for e-discovery across a wide range of matters. One such matter is the case of Datel Holdings v. Microsoft Corporation, which is presently in District Court for the Northern District of California. As part of those proceedings, Microsoft mentioned Clearwell in its Opposition to Datel’s Motion to Compel that was ruled upon on March 11, 2011:

Defendant explains that after potentially responsive documents were collected from custodians, they were loaded into a computerized document processing system known as “Clearwell.” Clearwell extracted metadata from each document and converted the documents into a format that allowed for text searching. Once the documents were processed through Clearwell, they were entered into an online platform, where they were reviewed by attorneys. For reasons still unknown to Defendant, Clearwell truncated some “Re-auth” documents during processing.

In itself, this sounds unremarkable. But we’ve noticed that some of our small competitors have been using this statement, and particularly the last line of it, to suggest that there are problems with the Clearwell product.

We realize that, as the market leader, there will always be small competitors seeking to leverage any opening to their advantage. Usually, we ignore this nonsense. But this time, to set the record straight, we asked our customer at Microsoft to respond on our behalf

Here’s what Joe Banks, who manages the e-discovery team at Microsoft, wrote about the issue and gave us permission to publish:

Statement from Microsoft:

In regard to the Declaration of Hojoon Hwang referenced in the 3/11/11 Order granting in part and denying in part plaintiff’s Motion to Compel in Datel Holdings LTD v. Microsoft, No.C-0905535EDL in the Northern District of California, the statement ‘For reasons still unknown to Defendant, Clearwell truncated some ‘Re-auth’ documents during processing’ should be corrected.  Microsoft subsequently learned that the cause of the truncation was the Microsoft software (AD/RMS Bulk Protection Tool) employed to decrypt previously encrypted content, and the truncation issue had nothing to do with Clearwell’s technology whatsoever.  Shortly after Mr. Hwang’s declaration was filed, he clarified – on the record in open court on February 22 – that Microsoft’s decryption process was the true cause of the data truncation:

6 A lot of Microsoft documents, including e-mails, are

7 encrypted when they are sent. And for production purposes, we

8 have to decrypt it. In that process, some of the material got

9 cut off.

Microsoft does not use Clearwell technology to decrypt its data.  In actuality, Clearwell’s Engineering and Support teams were instrumental in helping to identify the root cause of the truncation issue.  Microsoft continues to use Clearwell’s processing and analysis technology on this matter and greatly appreciates the partnership and support Clearwell provides without fail.