Posts Tagged ‘search’

As the Electronic Discovery World Zurns

Wednesday, July 29th, 2009

Judge Grimm’s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.

“[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.”

Despite, legions of articles and blogs on the topic, at least certain portions of the bench haven’t taken heed.  In the case In re: Zurn Pex Plumbing Products Liability Litigation, 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter “Zurn“), U.S. District Judge Ann Montgomery receives points for understanding some basic e-discovery tenants around recall and precision, but then mysteriously goes where “angels fear to tread” by suggesting her own search terms.

Examining the case facts in more detail,…  Zurn is a class action products liability case where discovery was bifurcated (as is often the case – see Spieker v. Quest Cherokee) to first cover the class “certification” component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that “while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.”  However, the Magistrate didn’t shut the door entirely, suggesting that “should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.”

Despite complying with Sedona’s Cooperation Proclamation (”The parties have worked amicably throughout the discovery process”) opposing counsel still got to loggerheads when plaintiff found “voids” in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.

Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff’s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared “J” and “K” drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required “approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.”  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could’ve added another $400,000 for the project.

Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney’s representations about the resulting burden:

“It is unclear whether Zurn’s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians’ emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.”

The 361 gigabytes apparently resulted from “hits” corresponding to plaintiff’s 26 search terms.  The court correctly identified that those terms had precision issues (”many of Plaintiffs’ proposed search terms will likely produce a large number of ‘hits’ that have limited relevance in the case.”)

Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm’s warning and surprisingly took matters into her own hands: “the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.”  Here is the Judge’s list of keywords:

(1) AADFW,
(2) Corrosion,
(3) Corrosive,
(4) Corrosive Water,
(5) Crack,
(6) De-zinc,
(7) Dezincification,
(8) DZR,
(9) Fail,
(10) IMR,
(11) Leak,
(12) MES,
(13) SCC,
(14) Stress corrosion cracking

Without looking at the underlying data, it’s clear from the outset that Judge Montgomery didn’t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could’ve been captured by a single stemmed search using the term “corros*.” Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would’ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.

Judge Montgomery did recognize the potential error of her ways and gave the parties an out:

“The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.”

This “specific evidence” language seems to track notions from Sedona’s search best practices protocol, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless blindly proffer the 14 term search strategy.  Instead, she should’ve quoted Victor Stanley and required the parties to come up with a data driven approach that met requisite precision and recall metrics.

EDRM Continues Drive to Solve Practical Electronic Discovery Problems

Tuesday, June 23rd, 2009

As most electronic discovery veterans are aware, the EDRM Project is an effort founded five years ago by George Socha and Tom Gelbmann to bring together a community of e-discovery practitioners for the purpose of solving some of the industry’s most challenging problems.

It may be hard to believe, but there was time in the very recent past where the iconic EDRM model did not yet exist. No multicolored boxes, no arrows, no sloping volume and relevance lines — nothing. Coming up with a standard way of talking about electronic discovery was the first problem that the group set about solving, and I think it would be hard to argue with the fact that they came up with the gold standard: a simple, clear, concise model that, at least so far, is standing the test of time as a way of thinking about the flow of the e-discovery process.

With each passing year, the group has started to address a broader set of problems, all with a practical bent.  Currently, there are eight:

Project Goal
Evergreen Keep the EDRM model fresh and relevant as the industry grows and evolves
XML Provide a standard, generally-accepted XML schema to facilitate the movement of electronically stored information from one step of the e-discovery process to the next
Metrics Provide an effective means of measuring the time, money, and volumes associated with e-discovery activities
Code of Conduct Develop aspirational voluntary ethical guidelines for e-discovery providers and consumers
Search Provide a framework for defining and managing the various aspects of search as it applies to the e-discovery workflow
Data Set Compile a 100 gigabyte public data set that can be used to test various aspects of e-discovery software and services
Jobs Provide a professional resource for the e-discovery community and  communicate about e-discovery related jobs
Information Management Explore the emerging need for e-discovery standards in information management (the “upstream” part of the process)

This year’s annual EDRM conference took place back in May. After years of meeting in the same chilly and wind-swept location in downtown St. Paul, Minnesota, George and Tom had the brilliant idea of spicing up the meeting a bit by moving it to a more exotic locale: Bora Bora! Plans were set in motion, but quickly the overwhelming feedback came back from EDRM members: E-discovery is so fascinating, so heart-warming, that adding Bora Bora to the mix would simply be too much for the vast majority of the participants to bear. So St. Paul it was!

This was Clearwell’s third EDRM conference, and location aside, it’s been fascinating to see how it has changed over the last few years. Here are several notable trends from this year’s kickoff:

  • More participation from end-users: There was a definite increase in the number of end-user/consumer participants (that is, those not from the vendor community), particularly from law firms. This could be taken as further evidence that e-discovery is indeed moving in-house.
  • Increased enthusiasm to take on new challenges: One of the great things about EDRM is its willingness to try to tackle new areas that aren’t being directly addressed by some of the other (fantastic) organizations out there like Sedona. This was in evidence several years ago, when Clearwell was fortunate to get involved in the early stages of the EDRM XML project, which has proven to be a huge time, cost, and risk reducer for many in the industry by providing a common standard that can be used to move data within the e-discovery process. It was in evidence last year when Clearwell’s CTO was able to help launch a new effort around Search that is seeking to develop standards and best practices in an increasingly complex and contentious area. And, finally, it was in evidence this year with the launch of the Information Management project, a cutting-edge group that is exploring how to solve the challenges that e-discovery poses for information management – certainly a complex area in need of thought leadership.
  • Improved collaboration: One thing that has amazed us from day one is how collaborative EDRM is, and continues to become. There are a lot of e-discovery vendors involved who, outside of the confines of the St. Paul Hotel, aggressively compete in the marketplace. However, George and Tom have been able to create an environment at EDRM where competitive spirits are set aside and ideas can be cultivated which provide huge value across the e-discovery landscape (both vendor and consumer).

One final note: If you’re an e-discovery practitioner in a law firm or corporate setting, I’d encourage you to get connected, either informally (through the EDRM web site) or formally (by signing up for one or more of the projects). While end-user involvement continues to grow, there is definitely still a need for more non-vendor involvement. It is critical in ensuring real and relevant problems get solved, and to pushing the state of the art in e-discovery forward. Please join us!

Electronic Discovery Services: The Price is Right?

Wednesday, June 17th, 2009

Maybe this will show my age, but I’ve been around the electronic discovery business since the days when pricing was both simple and very expensive. Terabytes were at the mythical high-end of the spectrum and gigabytes of “e-docs” (not “ESI”) cost $3,000 – $4,000 to process. Understandably (and fortunately for most), pricing models have evolved, thanks in part to more educated consumers and initiatives such as Sedona’s RFP + Vendor Panel.

Leaving the WABAC machine and moving into present times, we’ve starting to see some variance from traditional pricing models that primarily focus on data “into” the processing machine. More and more companies (such as Kroll Ontrack) are moving to models that price on data “out” of the process. Since that’s a bit nebulous, an example might illustrate:

Traditionally, in a somewhat simplified fashion, an electronic discovery project would be priced by the amount of data in the initial corpus (say 100 gigabytes) and processing would be priced at $500 a gigabyte (for round numbers purposes). Leaving out the sometimes significant caveat that the 100 gigabytes would likely increase due to expansion of compressed files, this would mean that the bulk of the project expenses would be $50,000 ($500 x 100), plus relatively nominal costs for monthly hosting and user access rights.

At the end of the day, after elimination of system files, deduplication and application of search terms (reducing the initial corpus by say 70% collectively) there would be 30 gigabytes remaining for hosting and possible production, both of which are most often priced separately.

Given rampant commoditization there’s an arms race underway among certain service providers where they’re now changing the above model to give away initial processing as a loss leader – pricing only on the data that comes out the end of the processing/search step. In this approach the above workflow would largely stay the same, but the vendor would charge a higher rate for what ultimately is hosted on the back-end. If this back-end fee was $2,000 per resulting gigabyte and the same 30 gigabytes was seen out the back end, then the customer would pay $60,000 for the project. But, if the deduplication, searching, culling, etc. was more effective (at say 80%) then the resulting 20 gigabytes would only cost $40,000.

The question then, as Clint Eastwood would put it, is: “Do you feel lucky?” This pricing model forces attorneys and litigation support managers to guesstimate what culling, search, and de-duplication rates they’ll likely get on the data corpus. Guess right and they save the end client money, guess wrong and they’re way over budget.

The dynamics of this purchasing decision are a bit atypical because the buyer (usually counsel) doesn’t pay the bills, so the decision can often be more vexing than most. When a direct consumer gambles on pricing things will ideally balance out over time, with money being saved in some instances and some being overspent in others. But, when the buyer doesn’t pay the bills the motivation is less clear.

Thoughts run to Maslow’s hierarchy of needs to determine which pricing model is ultimately more compelling: (a) price certainty/adherence to budget, or (b) cost variability and the opportunity to save money. While it’s never good to understate the upside of saving money (Esteem), I think ultimately there’s a more fundamental need (Safety) to stay within budget and avoid the painful (sometimes client imperiling) call to discuss how a given e-discovery project has gone way over budget.

This calculation is made further vexing because it not only pits the purchasing party against unknown data culling/searching rates, but it also puts the vendor in an ethical bind where they make less money if they’re supremely effective at data reduction, whereas if they’re either intentionally or accidentally beneficiaries of relatively little data reduction then they stand to make a ton of upside.

It’s like you went to Vegas to gamble your kid’s college fund and on top of the already questionable house odds you knew that the dealer stood to profit by your losses. So, as for myself, no, I don’t feel lucky.

Five Electronic Discovery Questions Regarding Inaccessibility With David Isom

Thursday, April 30th, 2009

David Isom and I have collaborated a number of times over the years on a variety of electronic discovery presentations and articles.  So, when I saw that California was proposing new state electronic discovery rules that had some interesting variances vis-à-vis the FRCP, I thought David might be able to give us the benefit of his unique and sage perspective.

1. David, as the author of the definitive piece about inaccessibility under the Federal Rules of Civil Procedure (The Burden of Discovering Inaccessible Electronically Stored Information: Rules 26(b)(2)(B)& 45(d)(1)(D)), how many litigators do you think really understand and use these provisions?

I sense that litigators with a basic understanding of the new electronic discovery rules know that the inaccessibility rule exists and provides some protection for parties against unduly burdensome discovery.  Few seem to have noticed that Rule 45 contains an inaccessibility provision whose language is similar to the Rule 26(b)(2)(B) inaccessibility protection for parties, but whose protections as applied to subpoenaed nonparties are greater than the protections for parties.  Here are the three most basic and exciting (or excruciating, depending upon your side of the fence) impacts of the new inaccessibility rules:

(1) The inaccessibility rule has completely changed a nonparty’s leverage to narrow subpoenas seeking electronically stored information (ESI).  Subpoenaed nonparties now have protection against fishing expedition subpoenas that did not exist before — to narrow subpoenas, or to require the payment of costs and attorney fees in responding to broad subpoenas.

(2) Cost-shifting, for parties as well as nonparties, is now controlled by the inaccessibility rules.  Several federal courts have recently held that discovery cost-shifting is allowed only if these inaccessibility rules provide for cost-shifting under the circumstances.

(3)  The inaccessibility rules must be asserted and asserted timely if they are to provide protection.  For example, after counsel for nonparty Office of Federal Housing Enterprise Oversight spent $6 million of our money responding to a subpoena in In re Fannie Mae Securities Litigation, 552 F. 3d 814 (D.C. Cir. 2009), counsel tried to recover the money on an inaccessibility cost-shifting argument.  To which the United States District Court and the Court of Appeals for the District of Columbia said, in essence:  you might have had a good idea, and saved your client $6 million, had you raised the arguments before agreeing to produce the documents and spending all that money.  But you agreed to produce the ESI and cannot come back now and get any protection.  You should have studied the inaccessibility rule.

2. So, assuming we’re still early in the learning curve, do you think these FRCP provisions are really gaining traction either in practice or in the case law?

Judging by the number of reported decisions, the inaccessibility rules are receiving as much attention as the other new features of the federal electronic discovery rules.  Which, I suppose, is damnation by faint praise — a large percentage of the reported cases are about what should happen because lawyers didn’t understand or apply the rules properly. Cason-Merenda v. Detroit Medical Center, 2008 U.S. Dist. LEXIS 51962 (E.D. Mich. July 7, 2008) is a good example.  There, defendant’s counsel produced ESI without any objection and without pre-identifying the ESI as inaccessible.  After production, counsel tried to get their opponents to share the cost of producing the allegedly inaccessible ESI.  The court correctly held that the ESI must be identified as inaccessible in advance of the production to give the seeking party the option to decide whether the discovery is really worth the candle, especially given the prospect that the cost of production might be shifted to the seeking party.

3. What are your thoughts on the new California state provisions regarding “inaccessible” ESI where they’re proposing a different treatment and slightly different burden?  And, will this approach ultimately weaken responding parties abilities to make “inaccessible” claims successfully?

I am not an expert on California law, but am keenly interested in what the states are doing with electronic discovery.  As of this writing (May 2009), it appears that California Assembly Bill No. 5 has not yet been enacted.  Yet, here are some thoughts about how the inaccessibility provisions of this bill, if enacted, would compare to the federal rules of inaccessibility.  The bottom line is that the California bill is remarkably similar to the federal rules on inaccessibility issues.

Under the federal rules, a party seeking protection for inaccessibility initiates the process by “simply” (so far, the courts have tolerated fairly sparse identifications as satisfying this requirement) identifying the sources of information claimed to be not reasonably accessible because of undue burden or cost.  The subpoenaed nonparty seeking protecting can initiate by identifying the ESI sought as not reasonably accessible in an objection, motion to quash or motion for protective order.  In the federal system, either the seeking party or the protecting party or nonparty can move to test the issue (one by a motion to compel, the other by a motion for protective order).

The California bill is nearly identical to the federal process.  The bill provides that a person resisting a subpoena for ESI on inaccessibility grounds may “oppose” the subpoena.  If this means that such a person can either object or move to quash or move for a protective order, it appears to be the same as the federal rule.  The California bill specifies that a party resisting a production request on inaccessibility grounds initiates protection by identifying the types or categories of sources of electronically stored information that it asserts are not reasonably accessible.  This is similar to the federal rule, whose text requires identification of “sources”, but whose committee notes clarify that merely “types or categories of sources” of inaccessible, responsive ESI need be identified.  The California’s Legislative Counsel’s Digest indicates that the process for protecting inaccessible ESI, apparently for both parties and subpoenaed nonparties, can be initiated by moving for a protective order, or by opposing or objecting to the subpoena or request.

Even if there are any distinctions in the above processes, the two processes appear to merge thereafter.  In both systems, the motions to test inaccessibility must be preceded by a conference of counsel to attempt in good faith to resolve the issue, together with a certificate that such an attempt has been made.  In both, the person seeking protection has the burden of proving inaccessibility (this is even true in the federal system where the process is initiated by the seeker’s motion to compel).  In both systems, if the holding party proves inaccessibility, the burden shifts to the seeking party to show good cause for producing the ESI, despite its inaccessibility.

And in both, if good cause is shown, the court may still impose conditions upon production, including cost-shifting.  In both, the factors that the courts are to consider in determining good cause are similar — more accessible, less burdensome sources; cumulativeness of the discovery; whether the burden or expense of the discovery would outweigh the likely benefit of the discovery, considering such things as the importance of the issues, the amount in controversy and the resources of the parties.  One possible difference between the California bill and the federal rules on good cause is that the California bill requires the court to limit discovery if any of the listed factors exists, where the federal rules and committee notes seem to envision a pure balancing.

In sum, the California bill essentially adopts the federal approach.

Some confusion has arisen because California commentators have drawn a distinction between the California bill and a misinterpretation of the federal rules.  One commentator, for example, stated that “under the federal rules, if ESI is inaccessible, the responding party simply doesn’t need to produce such documents.”  This ignores the affirmative identification duty that I discussed above.

4. With the rapid advancements in ESI restoration technologies, which the Comments to the Rule anticipated, are backup tapes in your mind still “inaccessible”?

The rules make it clear that inaccessibility cannot be measured by technology category alone.  The test does not depend upon the type of technology involved, but upon the balancing of need, technology, importance, spoliation, relevance, alternative sources and potential benefit against overbreadth, burden and cost.  So, if backup tapes are the only source available for important, relevant information because more accessible relevant sources have been spoliated, backup tapes will not be deemed inaccessible.  Without spoliation, if relevant ESI is available on active sources, backup tapes may not be discoverable.

Perhaps the main reason that categories of technology cannot be deemed per se accessible or inaccessible is that the technology is changing so fast.  Many search tasks that were expensive and difficult five years ago are much more doable now.

5. Finally, what do you think the future holds for these FRCP sections?

The inaccessibility rules will continue to be the main battleground where the great debates about the value and cost of electronic discovery will be fought, since these rules are specifically tailored to balance all of the interests in that debate.

Some groups are claiming that electronic discovery is wasteful and expensive, and that the new rules exacerbate the problem.  Of course, the federal rules ought always to be analyzed for problems and need for improvement, but I haven’t heard informed, thoughtful, helpful suggestions for improvements to the federal rules in the recent debate.  Overall, I see the adoption of the federal rules as having helped reduce the cost of electronic discovery, not increased the cost.

A Gross Inability to Craft Electronic Discovery Searches

Thursday, April 9th, 2009

The bashing of our judicial system seems to have reached a fevered pitch.  Groups like the American College of Trial Lawyers (”ACTL”) have proclaimed in a recent report that while the “civil justice system is not broken, it is in serious need of repair.”  The blame game seems to have judges and attorneys alike pointing fingers.  The Fellows of the ACTL (perhaps not surprisingly) seems to pin some of the blame on the judiciary:

“Judges should have a more active role at the beginning of a case in designing the scope of discovery and the direction and timing of the case all the way to trial. Where abuses occur, judges are perceived not to enforce the rules effectively.”

Groups like the Sedona Conference chalk up many of the ills to the failure to cooperate, so much so that they’ve orchestrated a cooperation proclamation – which has picked up enough support by the bench to have garnered several cites in the case law (see e.g., Mancia).

The bench for its part seems to put some of the onus on litigators and their reticence to get with the times.  William A. Gross. Constr. Assocs., Inc. v. Am. Mfrs. Mut. Ins. Co., 2009 WL 724954 (S.D.N.Y. Mar. 19, 2009) is the latest example of such a proclamation.  In this construction defect case, Judge Peck (a Sedona devotee) issues what he hopes will be a “wake-up” call to the bar about the need for “careful thought, quality control, testing, and cooperation with opposing counsel in designing search terms or ‘keywords’ to be used to produce emails or other electronically stored information (‘ESI’).”  In Gross, the court had to mediate an e-discovery dispute where the requesting party propounded a blatantly over-inclusive search request crafted by the requesting parties.  Unfortunately, the responding entity was a non-party and they simply dig their heads in the sand.  In order to facilitate a resolution this left the Court in the “uncomfortable position” of having to craft a “keyword search methodology for the parties, without adequate information from the parties (and Hill).”

Judge Peck’s exasperation with these antics was palpable.  Summing up the problem by citing Judge Grimm and Victor Stanley he stated: “This case is just the latest example of lawyers designing keyword searches in the dark, by the seat of the pants, without adequate (indeed, here, apparently without any) discussion with those who wrote the emails.”  He further noted: “[w]hile this message has appeared in several cases from outside this Circuit, it appears that the message has not reached many members of our Bar.”

After noting both Sedona and Judge Facciola (of O’Keefe and Equity Analytics fame) Peck’s opinion reached a crescendo:

“Electronic discovery requires cooperation between opposing counsel and transparency in all aspects of preservation and production of ESI. Moreover, where counsel are using keyword searches for retrieval of ESI, they at a minimum must carefully craft the appropriate keywords, with input from the ESI’s custodians as to the words and abbreviations they use, and the proposed methodology must be quality control tested to assure accuracy in retrieval and elimination of ‘false positives.’ It is time that the Bar-even those lawyers who did not come of age in the computer era-understand this.”

While it’s easy to see who Peck blames in this brouhaha, it takes (at least) two to tango.  Meaning that litigants on both sides of the “v” must move beyond the typical “seat of the pants” electronic discovery wrangling.  And, judges need to be savvy enough to spot the issues to help/force the parties into such an enlightened/cooperative state.  Nothing short will get the job done.

Government Launches Bold New Recovery Effort

Tuesday, March 31st, 2009

While we don’t normally report news on the blog, this article seemed important enough to repost in its entirety…

SEEKING NEW AVENUE FOR COST-CUTTING, GOVERNMENT LAUNCHES BOLD NEW RECOVERY EFFORT

WASHINGTON — Senior Administration officials today took the wraps off of their latest effort to stabilize the American economy: The nationalization of the electronic discovery industry. According to a senior official who declined to be identified, “Even before the beginning of the current turmoil, everyone acknowledged that electronic discovery costs were out of control. Now, with litigation accelerating and corporate earnings plummeting, something had to be done. Without this action, a significant number of leading American corporations would be in danger of shutting their doors due to the overwhelming burden of e-discovery.”

A Single Common Portal

Effective immediately, all electronic discovery projects are being centralized under a single authority, the National Electronic Record Discovery Institute (NERDI). The Institute will be launching a nationwide electronic discovery portal on April 1, 2009 at www.ediscovery.gov. The site will build upon the recent success of the government’s economic recovery accountability site, www.recovery.gov. Said one Institute official, “Just drop the ‘r’ and insert a ‘dis’, and you get eDiscovery. It really is the next logical step in the government’s efforts to help the country in a time of profound need.”

Industry experts initially expressed skepticism about the government’s ability to make electronically discoverable information available in an efficient, expedient, and secure manner. Early plans had the government using the U.S. Postal Service and the network of I.R.S. tax return servicing centers as the logistical backbone for managing the collection and processing of documents. However, after negotiations with the National Security Agency, this step was eliminated from the process. Instead, all electronically-generated information in the United States will be instantly processed and made available through the ediscovery.gov site. Commented an NSA spokesman, “We have all the information anyway; why not make it easily accessible, instead of pretending it’s not here?” As for security, officials stated that “individuals can expect the same level of security and identify protection they’ve come to expect from their financial institutions and credit card companies, along with the additional protection and responsiveness they’ve come to expect from the Federal government.”

The Future of the E-Discovery Industry

What will become of the existing electronic discovery industry, made up of hundreds of individual vendors with aggregate revenue estimated to be in the $2-3 billion dollar range? According to a senior-level NERDI director, “One word: toast.” However, a group of industry software vendors and service providers has expressed open skepticism about the ability of a historically incompetent, multilayered bureaucracy to deliver electronic discovery services more effectively than the competitive market.

One vendor pointed out that it will be “difficult for the government to establish itself as a credible player in electronic discovery with millions of White House emails still missing without a trace.” In response, the group of vendors that make up the Top 5 Software and Service Provider lists on the 2008 Socha-Gelbmann survey (Autonomy, Clearwell, Fios, FTI, Guidance, Kroll, and LexisNexis) have announced an immediate consolidation of operations under the name ClearGuideAutoKrolLexFTios. Gloated new incoming CEO Rick Wagoner, “Our expectation is to roll over the government’s efforts like our new name rolls off your tongue.”

Time to Work Together on Electronic Discovery

Friday, February 27th, 2009

Cheesy Successories posters aside (for an alternative take, go here), the need to work together is much more than just a cliché in today’s environment.

In its recent brief on the five major trends that will shape business technology in 2009, leading management consultancy McKinsey and Company noted one trend in particular which highlights the urgent need for an organization’s IT and legal groups to forge better, faster, and more efficient ways of collaborating on electronic discovery issues:

Regulators demand more from IT

Government scrutiny of business will intensify in many developed countries. Already, in the United States, the Office of the Comptroller of the Currency weighs in on the resiliency of banking systems, the Food and Drug Administration (FDA) requires that many pharmaceutical systems be “validated,” and Sarbanes-Oxley drives decisions about accounting systems in every industry. In the future, policy makers and regulators will probably demand that IT systems capture more and better data in order to gain greater insight into and control over how banks manage risk, pharma companies manage drugs, and industrial companies affect the environment. Government officials also will monitor many legal and business rules more closely to ensure compliance with mandates. Successful CIOs should enhance their relationships with internal legal and corporate-affairs teams and be prepared to engage productively with regulators. They will need to seek solutions that meet government mandates at manageable cost and with minimal disruption.

- McKinsey Quarterly, February 2009

The current economic environment is creating a “Double Whammy” within almost every enterprise that has ongoing or pending electronic discovery issues (and are there many organizations left out there that don’t?):

  • As the McKinsey article notes, regulators will increasingly be demanding more from IT as government scrutiny of business intensifies. Just look at the just-launched recovery.gov site to see the level of transparency and accountability that the government is aiming for with regard to the stimulus package. The bailout will not directly affect every business, but there is a new sheriff in town who will likely set the tone across the entire business landscape.
  • At the same time, there is relentless pressure on controlling costs. When times are tough, dollars that can be saved on the expense side are much more valuable that top-line revenue, since 100% of every dollar of cost savings goes directly to the bottom line.

The net-net: Enterprises will be forced to do more, with less.

How? With regard to electronic discovery, there is a lot of low-hanging fruit to be picked in the area of IT and legal cooperation:

  • In-house legal teams should meet with IT (if they aren’t already) to help them better understand the nature of electronic discovery, particularly as it applies to the more “upstream” parts of the process (specifically, identification, preservation, and collection) which IT tends to be more responsible for. Through a better understanding of the nature of electronic discovery, IT can improve its ability find the right documents, avoiding over-collection and reducing downstream processing costs. In addition, new electronic discovery technologies are making it increasingly easy for legal to own more of the process, reducing the electronic discovery burden on IT.
  • Conversely, IT should coordinate with in-house legal teams to provide advice and mentoring as legal seeks to bring e-discovery platforms in-house to assist with early case assessment, search, culling, and analysis. To many legal teams, bringing e-discovery in-house may seem like a daunting proposition, but enterprise software has been around for a long time, and learning from IT’s experiences can make the process far less intimidating.

Yes, regulators are going to be far more demanding in the future than they have been in the past. But some simple collaboration and coordination between IT and legal will go a long way toward lightening the regulatory burden, especially as it pertains to electronic discovery.

The Electronic Discovery Sheriff Is Back In Town

Thursday, January 29th, 2009

As Tiger Woods is to golf, the honorable Shira A. Scheindlin is to electronic discovery.  She has unquestionably been the most dominant/visible/outspoken jurist in the electronic discovery realm over the past decade, penning amongst others, the Zubulake opinion, which is commonly referred to as the gold standard in electronic discovery.

But, like Woods, who recently took a sabbatical to mend his surgically repaired knee, Judge Scheindlin has recently been eclipsed by several other notable electronic discovery jurists, namely Judge Grimm (of Victor Stanley and Mancia fame) and Judge Facciola (aka “the Italian Stallion“) both of whom made numerous “best of the year” electronic discovery case law lists.

With Securities and Exchange Commission v. Collins & Aikman Corp., 2009 WL 94311 (S.D.N.Y., Jan. 13, 2009) Judge Scheindlin serves notice that the sheriff is back in town.  She not only tackles a number of thorny electronic discovery topics, but ambitiously takes on the US government in the process.  It’s fairly lengthy opinion, well worth the read, so I’ll just excerpt out a few of the notable takeaways.

As a bit of background…  the Collins case centered around a securities fraud complaint brought by the SEC against the Collins & Aikman Corp. and its former CEO David A. Stockman.  The crux of the dispute surrounded questions concerning the government’s discovery obligations in civil discovery (versus in a purely SEC investigation per se).

There were four distinct but interrelated disputes, namely:

“(1) Whether identifying responsive documents that have been organized by the producing party invades the protection accorded to attorney work-product and how a government agency-acting in its investigative capacity-must respond to a request for the production of documents. (2) Whether a government agency may unilaterally restrict the scope of its search based on an assertion of an “undue burden” on limited public resources. (3) How much information the Government must disclose in order to allow an adversary-and the court-to assess an objection based on the deliberative process privilege. (4) Whether a government agency may unilaterally exclude its own e-mail from document production on the ground that most-but not all-will be privileged.”

Addressing the work product claims, the court found against the government, again reinforcing several recent opinions about electronic discovery search:

“The SEC contends that Stockman can search through the ten million pages and find substantially the same documents identified by the SEC without impinging on the thought processes of the SEC attorneys. Indeed-at significant expense and delay-Stockman could search the document databases using appropriate search terms, but the inaccuracy of such searches is by now relatively well known.  A page-by-page manual review of ten million pages of records is strikingly expensive in both monetary and human terms and constitutes “undue hardship” by any definition.” [Citing, George L. Paul and Jason R. Baron's article: Information Inflation: Can the Legal System Adapt?

After losing the first battle, the SEC argued that even if the compilations were not protected as work product, it could produce the "complete, unfiltered, and unorganized investigatory file" since this was how the documents were "maintained in the usual course of its business."  This second attempt was similarly unpersuasive as Judge Scheindlin held that the "usual course of business" exemption did not apply:

"[C]onducting an investigation-which is by its very nature not routine or repetitive-cannot fall within the scope of the “usual course of business.” While the SEC routinely collects and maintains regulatory submissions such 10-K reports, in its investigative capacity the agency conducts tailored probes of a company or an industry, requiring the gathering of records from diverse sources. Many if not most of the 1.7 million documents in the SEC production here were likely collected in the agency’s investigatory role. Thus it is no surprise that the complete collection is maintained as it was collected-in large disorderly databases. The documents can only be provided in a useful manner if the agency organizes or labels them to correspond to each demand.”

Next, Judge Scheindlin addressed the SEC’s decision to “unilaterally” limit its search to “centralized compilations” which ultimately “turned up nothing.”  She found that the SEC’s “blanket refusal to negotiate a workable search protocol” was “patently unreasonable” citing both Mancia and the Sedona Conference’s Cooperation Proclamation:

“Rule 26(f) requires the parties to hold a conference and prepare a discovery plan. … Had this been accomplished, the Court might not now be required to intervene in this particular dispute. I also draw the parties’ attention to the recently issued Sedona Conference Cooperation Proclamation, which urges parties to work in a cooperative rather than an adversarial manner to resolve discovery issues in order to stem the ‘rising monetary costs’ of discovery disputes.”

As the coup de gras, Judge Scheindlin addressed and rejected out of hand the SEC’s most untenable claim that it would not produce e-mail “generated or received by the Commission itself” because “nearly all responsive e-mails will be privileged, protected, or non-substantive.”

“Because e-mails are inherently searchable, the SEC’s blanket refusal to produce any in-coming or outgoing e-mails is unacceptable. Without even an attempt to negotiate search terms that would weed out privileged, protected, or irrelevant e-mails, the SEC cannot reasonably assert that a routine aspect of modern discovery-search and review of a party’s e-mail-is beyond its capability. Essentially, the SEC’s position is that the cost of such a search is simply too high, but it has made no effort to document the cost or the likelihood that it would produce relevant, nonprivileged material. The concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.”

At the end of the day, the Collins opinion seems to make statement the Judge Scheindlin is back with a vengeance and she’s serving notice that the government isn’t above the law:

“Like any ordinary litigant, the Government must abide by the Federal Rules of Civil Procedure.”

Besides knocking the government down a peg, Judge Scheindlin throws her judicial weight behind a number of important but nascent trends, including the Sedona Cooperation Proclamation, the related need to meet & confer, the use of sampling and the challenges of electronic discovery search. While none of these notions are groundbreaking, her substantial backing means increasing clarity for lawyers and litigation support practitioners everywhere.  And, that’s certainly welcome.

Federal Rule of Evidence 502: Help or Hype?

Thursday, November 13th, 2008

There’s a lot of excitement (and corresponding uncertainty) about the recent passing of Federal Rule of Evidence 502 (FRE 502), which was signed into law on Sept 19th.  The main reason that the legal community is excited about FRE 502 is because of the potential for cost savings by reducing the amount of money associated with the e-discovery review process, which is routinely viewed as the most expensive area in the entire e-discovery process.

In combination with the codification of a national standard to determine when a privilege has been waived, FRE 502 is primarily designed to make the use of claw-back agreements a truly viable prospect when doing e-discovery privilege review.  It should provide some panacea (ideally) for rapidly escalating e-discovery costs.  Or, at least that was the impetus behind the rule’s creation – according to the Comments:

“The proposed new rule facilitates discovery and reduces privilege-review costs by limiting the circumstances under which the privilege or protection is forfeited, which may happen if the privileged or protected information or material is produced in discovery. The burden and cost of steps to preserve the privileged status of attorney-client information and trial preparation materials can be enormous. Under present practices, lawyers and firms must thoroughly review everything in a client’s possession before responding to discovery requests. Otherwise they risk waiving the privileged status not only of the individual item disclosed but of all other items dealing with the same subject matter. This burden is particularly onerous when the discovery consists of massive amounts of electronically stored information.”

In short, FRE 502 is designed to establish uniform, nationwide standards for waiver of attorney-client privilege and work product protection, with the main goal being to protect producing parties against the inadvertent disclosure of privileged materials or work product in either federal or state proceedings.  The salient section is subsection (b) which states that when a disclosure of privileged information is made in a federal proceeding or to a federal agency, the disclosure does not constitute a waiver if:

  1. the disclosure is inadvertent;
  2. the holder of the privilege or protection took reasonable steps to prevent disclosure; and
  3. the holder promptly took reasonable steps to rectify the error, including (if applicable) following Federal Rule of Civil Procedure 26(b)(5)(B).

The end game here is presumably to increasingly leverage automated review methodologies to save costs.  But, in order to facilitate this type of review methodology without taking on unhealthy levels of risk means that claw-back provisions must be as airtight at possible to prevent inadvertent electronically stored information (ESI) productions.  And yet, exactly how FRE 502 will work in practice is up to debate since there isn’t any case law interpreting it yet.

One area that’s top of mind is how this new Rule will impact the recent decisions on e-discovery search, including the Victor Stanley case authored by Chief Magistrate Judge Grimm.  Since FRE 502 contains a core “reasonableness” prong in section (b) it’s likely that Grimm’s proclamation about e-discovery search will still be controlling.  Grimm fundamentally had to evaluate whether the producing party’s search protocols and procedures were in fact reasonable.

“Defendants, who bear the burden of proving that their conduct was reasonable for purposes of assessing whether they waived attorney-client privilege by producing the 165 documents to the Plaintiff, have failed to provide the court with information regarding: the keywords used; the rationale for their selection; the qualifications of M. Pappas and his attorneys to design an effective and reliable search and information retrieval method; whether the search was a simple keyword search, or a more sophisticated one, such as one employing Boolean proximity operators; or whether they analyzed the results of the search to assess its reliability, appropriateness for the task, and the quality of its implementation.” (footnotes omitted).

In Victor Stanley, the producing party wasn’t able to demonstrate reasonableness because they didn’t strategically craft out their strategy nor conduct any sampling to make sure that the e-discovery search worked as designed.  This type of analysis would still seem to come into play under FRE 502 and so, as Grimm states, the use of either a best practices or collaborative approach to e-discovery would seem to be as important as ever.

Given that backdrop it’s just as important as ever that parties “show their work” when it comes to e-discovery search.   Whether FRE 502 will really make parties feel safe enough to use automated review processes (thereby reducing costs) will remain to be seen.  But, this first step which unifies standards and expectations is at least a very positive step.

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in e-discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.