Archive for the ‘keyword search’ Category

LTNY Wrap-Up – What Did We Learn About eDiscovery?

Friday, February 10th, 2012

Now that that dust has settled, the folks who attended LegalTech New York 2012 can try to get to the mountain of emails that accumulated during the event that was LegalTech. Fortunately, there was no ice storm this year, and for the most part, people seemed to heed my “what not to do at LTNY” list. I even found the Starbucks across the street more crowded than the one in the hotel. There was some alcohol-induced hooliganism at a vendor’s party, but most of the other social mixers seemed uniformly tame.

Part of Dan Patrick’s syndicated radio show features a “What Did We Learn Today?” segment, and that inquiry seems fitting for this year’s LegalTech.

  • First of all, the prognostications about buzzwords were spot on, with no shortage of cycles spent on predictive coding (aka Technology Assisted Review). The general session on Monday, hosted by Symantec, had close to a thousand attendees on the edge of their seats to hear Judge Peck, Maura Grossman and Ralph Losey wax eloquently about the ongoing man versus machine debate. Judge Peck uttered a number of quotable sound bites, including the quote of the day: “Keyword searching is absolutely terrible, in terms of statistical responsiveness.” Stay tuned for a longer post with more comments from the General session.
  • Ralph Losey went one step further when commenting on keyword search, stating: “It doesn’t work,… I hope it’s been discredited.” A few have commented that this lambasting may have gone too far, and I’d tend to agree.  It’s not that keyword search is horrific per se. It’s just that its efficacy is limited and the hubris of the average user, who thinks eDiscovery search is like Google search, is where the real trouble lies. It’s important to keep in mind that all these eDiscovery applications are just like tools in the practitioners’ toolbox and they need to be deployed for the right task. Otherwise, the old saw (pun intended) that “when you’re a hammer everything looks like a nail” will inevitably come true.
  • This year’s show also finally put a nail in the coffin of the human review process as the eDiscovery gold standard. That doesn’t mean that attorneys everywhere will abandon the linear review process any time soon, but hopefully it’s becoming increasingly clear that the “evil we know” isn’t very accurate (on top of being very expensive). If that deadly combination doesn’t get folks experimenting with technology assisted review, I don’t know what will.
  • Information governance was also a hot topic, only paling in comparison to Predictive Coding. A survey Symantec conducted at the show indicated that this topic is gaining momentum, but still has a ways to go in terms of action. While 73% of respondents believe an integrated information governance strategy is critical to reducing information risk, only 19% have implemented a system to help them with the problem. This gap presumably indicates a ton of upside for vendors who have a good, attainable information governance solution set.
  • The Hilton still leaves much to be desired as a host location. As they say, familiarity breeds contempt, and for those who’ve notched more than a handful of LegalTech shows, the venue can feel a bit like the movie Groundhog Day, but without Bill Murray. Speculation continues to run rampant about a move to the Javits Center, but the show would likely need to expand pretty significantly before ALM would make the move. And, if there ever was a change, people would assuredly think back with nostalgia on the good old days at the Hilton.
  • Despite the bright lights and elevator advertisement trauma, the mood seemed pretty ebullient, with tons of partnerships, product announcements and consolidation. This positive vibe was a nice change after the last two years when there was still a dark cloud looming over the industry and economy in general.
  • Finally, this year’s show also seemed to embrace social media in a way that it hadn’t done so in years past. Yes, all the social media vehicles were around in years past, but this year many of the vendors’ campaigns seemed to be much more integrated. It was funny to see even the most technically resistant lawyers log in to Twitter (for the first time) to post comments about the show as a way to win premium vendor swag. Next year, I’m sure we’ll see an even more pervasive social media influence, which is a bit ironic given the eDiscovery challenges associated with collecting and reviewing social media content.

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Lessons Learned for 2012: Spotlighting the Top eDiscovery Cases from 2011

Tuesday, January 3rd, 2012

The New Year has now dawned and with it, the certainty that 2012 will bring new developments to the world of eDiscovery.  Last month, we spotlighted some eDiscovery trends for 2012 that we feel certain will occur in the near term.  To understand how these trends will play out, it is instructive to review some of the top eDiscovery cases from 2011.  These decisions provide a roadmap of best practices that the courts promulgated last year.  They also spotlight the expectations that courts will likely have for organizations in 2012 and beyond.

Issuing a Timely and Comprehensive Litigation Hold

Case: E.I. du Pont de Nemours v. Kolon Industries (E.D. Va. July 21, 2011)

Summary: The court issued a stiff rebuke against defendant Kolon Industries for failing to issue a timely and proper litigation hold.  That rebuke came in the form of an instruction to the jury that Kolon executives and employees destroyed key evidence after the company’s preservation duty was triggered.  The jury responded by returning a stunning $919 million verdict for DuPont.

The spoliation at issue occurred when several Kolon executives and employees deleted thousands emails and other records relevant to DuPont’s trade secret claims.  The court laid the blame for this destruction on the company’s attorneys and executives, reasoning they could have prevented the spoliation through an effective litigation hold process.  At issue were three hold notices circulated to the key players and data sources.  The notices were all deficient in some manner.  They were either too limited in their distribution, ineffective since they were prepared in English for Korean-speaking employees, or too late to prevent or otherwise ameliorate the spoliation.

The Lessons for 2012: The DuPont case underscores the importance of issuing a timely and comprehensive litigation hold notice.  As DuPont teaches, organizations should identify what key players and data sources may have relevant information.  A comprehensive notice should then be prepared to communicate the precise hold instructions in an intelligible fashion.  Finally, the hold should be circulated immediately to prevent data loss.

Organizations should also consider deploying the latest technologies to help effectuate this process.  This includes an eDiscovery platform that enables automated legal hold acknowledgements.  Such technology will allow custodians to be promptly and properly apprised of litigation and thereby retain information that might otherwise have been discarded.

Another Must-Read Case: Haraburda v. Arcelor Mittal U.S.A., Inc. (D. Ind. June 28, 2011)

Suspending Document Retention Policies

Case: Viramontes v. U.S. Bancorp (N.D. Ill. Jan. 27, 2011)

Summary: The defendant bank defeated a sanctions motion because it modified aspects of its email retention policy once it was aware litigation was reasonably foreseeable.  The bank implemented a retention policy that kept emails for 90 days, after which the emails were overwritten and destroyed.  The bank also promulgated a course of action whereby the retention policy would be promptly suspended on the occurrence of litigation or other triggering event.  This way, the bank could establish the reasonableness of its policy in litigation.  Because the bank followed that procedure in good faith, it was protected from court sanctions under the Federal Rules of Civil Procedure 37(e) “safe harbor.”

The Lesson for 2012: As Viramontes shows, an organization can be prepared for eDiscovery disputes by timely suspending aspects of its document retention policies.  By modifying retention policies when so required, an organization can develop a defensible retention procedure and be protected from court sanctions under Rule 37(e).

Coupling those procedures with archiving software will only enhance an organization’s eDiscovery preparations.  Effective archiving software will have a litigation hold mechanism, which enables an organization to suspend automated retention rules.  This will better ensure that data subject to a preservation duty is actually retained.

Another Must-Read Case: Micron Technology, Inc. v. Rambus Inc., 645 F.3d 1311 (Fed. Cir. 2011)

Managing the Document Collection Process

Case: Northington v. H & M International (N.D.Ill. Jan. 12, 2011)

Summary: The court issued an adverse inference jury instruction against a company that destroyed relevant emails and other data.  The spoliation occurred in large part because legal and IT were not involved in the collection process.  For example, counsel was not actively engaged in the critical steps of preservation, identification or collection of electronically stored information (ESI).  Nor was IT brought into the picture until 15 months after the preservation duty was triggered. By that time, rank and file employees – some of whom were accused by the plaintiff of harassment – stepped into this vacuum and conducted the collection process without meaningful oversight.  Predictably, key documents were never found and the court had little choice but to promise to inform the jury that the company destroyed evidence.

The Lesson for 2012: An organization does not have to suffer the same fate as the company in the Northington case.  It can take charge of its data during litigation through cooperative governance between legal and IT.  After issuing a timely and effective litigation hold, legal should typically involve IT in the collection process.  Legal should rely on IT to help identify all data sources – servers, systems and custodians – that likely contain relevant information.  IT will also be instrumental in preserving and collecting that data for subsequent review and analysis by legal.  By working together in a top-down fashion, organizations can better ensure that their eDiscovery process is defensible and not fatally flawed.

Another Must-Read Case: Green v. Blitz U.S.A., Inc. (E.D. Tex. Mar. 1, 2011)

Using Proportionality to Dictate the Scope of Permissible Discovery

Case: DCG Systems v. Checkpoint Technologies (N.D. Ca. Nov. 2, 2011)

The court adopted the new Model Order on E-Discovery in Patent Cases recently promulgated by the U.S. Court of Appeals for the Federal Circuit.  The model order incorporates principles of proportionality to reduce the production of email in patent litigation.  In adopting the order, the court explained that email productions should be scaled back since email is infrequently introduced as evidence at trial.  As a result, email production requests will be restricted to five search terms and may only span a defined set of five custodians.  Furthermore, email discovery in DCG Systems will wait until after the parties complete discovery on the “core documentation” concerning the patent, the accused product and prior art.

The Lesson for 2012: Courts seem to be slowly moving toward a system that incorporates proportionality as the touchstone for eDiscovery.  This is occurring beyond the field of patent litigation, as evidenced by other recent cases.  Even the State of Utah has gotten in on the act, revising its version of Rule 26 to require that all discovery meet the standards of proportionality.  While there are undoubtedly deviations from this trend (e.g., Pippins v. KPMG (S.D.N.Y. Oct. 7, 2011)), the clear lesson is that discovery should comply with the cost cutting mandate of Federal Rule 1.

Another Must-Read Case: Omni Laboratories Inc. v. Eden Energy Ltd [2011] EWHC 2169 (TCC) (29 July 2011)

Leveraging eDiscovery Technologies for Search and Review

Case: Oracle America v. Google (N.D. Ca. Oct. 20, 2011)

The court ordered Google to produce an email that it previously withheld on attorney client privilege grounds.  While the email’s focus on business negotiations vitiated Google’s claim of privilege, that claim was also undermined by Google’s production of eight earlier drafts of the email.  The drafts were produced because they did not contain addressees or the heading “attorney client privilege,” which the sender later inserted into the final email draft.  Because those details were absent from the earlier drafts, Google’s “electronic scanning mechanisms did not catch those drafts before production.”

The Lesson for 2012: Organizations need to leverage next generation, robust technology to support the document production process in discovery.  Tools such as email analytical software, which can isolate drafts and offer to remove them from production, are needed to address complex production issues.  Other technological capabilities, such as Near Duplicate Identification, can also help identify draft materials and marry them up with finals that have been marked as privileged.  Last but not least, technology assisted review has the potential of enabling one lawyer to efficiently complete the work that previously took thousands of hours.  Finding the budget and doing the research to obtain the right tools for the enterprise should be a priority for organizations in 2012.

Another Must-Read Case: J-M Manufacturing v. McDermott, Will & Emery (CA Super. Jun. 2, 2011)

Conclusion

There were any number of other significant cases from 2011 that could have made this list.  We invite you to share your favorites in the comments section or contact us directly with your feedback.

For more on the cases discussed above, watch this video:

Top Ten eDiscovery Predictions for 2012

Thursday, December 8th, 2011

As 2011 comes quickly to a close we’ve attempted, as in years past, to do our best Carnac impersonation and divine the future of eDiscovery.  Some of these predictions may happen more quickly than others, but it’s our sense that all will come to pass in the near future – it’s just a matter of timing.

  1. Technology Assisted Review (TAR) Gains Speed.  The area of Technology Assisted Review is very exciting since there are a host of emerging technologies that can help make the review process more efficient, ranging from email threading, concept search, clustering, predictive coding and the like.  There are two fundamental challenges however.  First, the technology doesn’t work in a vacuum, meaning that the workflows need to be properly designed and the users need to make accurate decisions because those judgment calls often are then magnified by the application.  Next, the defensibility of the given approach needs to be well vetted.  While it’s likely not necessary (or practical) to expect a judge to mandate the use of a specific technological approach, it is important for the applied technologies to be reasonable, transparent and auditable since the worst possible outcome would be to have a technology challenged and then find the producing party unable to adequately explain their methodology.
  2. The Custodian-Based Collection Model Comes Under Stress. Ever since the days of Zubulake, litigants have focused on “key players” as a proxy for finding relevant information during the eDiscovery process.  Early on, this model worked particularly well in an email-centric environment.  But, as discovery from cloud sources, collaborative worksites (like SharePoint) and other unstructured data repositories continues to become increasingly mainstream, the custodian-oriented collection model will become rapidly outmoded because it will fail to take into account topically-oriented searches.  This trend will be further amplified by the bench’s increasing distrust of manual, custodian-based data collection practices and the presence of better automated search methods, which are particularly valuable for certain types of litigation (e.g., patent disputes, product liability cases).
  3. The FRCP Amendment Debate Will Rage On – Unfortunately Without Much Near Term Progress. While it is clear that the eDiscovery preservation duty has become a more complex and risk laden process, it’s not clear that this “pain” is causally related to the FRCP.  In the notes from the Dallas mini-conference, a pending Sedona survey was quoted referencing the fact that preservation challenges were increasing dramatically.  Yet, there isn’t a consensus viewpoint regarding which changes, if any, would help improve the murky problem.  In the near term this means that organizations with significant preservation pains will need to better utilize the rules that are on the books and deploy enabling technologies where possible.
  4. Data Hoarding Increasingly Goes Out of Fashion. The war cry of many IT professionals that “storage is cheap” is starting to fall on deaf ears.  Organizations are realizing that the cost of storing information is just the tip of the iceberg when it comes to the litigation risk of having terabytes (and conceivably petabytes) of unstructured, uncategorized and unmanaged electronically stored information (ESI).  This tsunami of information will increasingly become an information liability for organizations that have never deleted a byte of information.  In 2012, more corporations will see the need to clean out their digital houses and will realize that such cleansing (where permitted) is a best practice moving forward.  This applies with equal force to the US government, which has recently mandated such an effort at President Obama’s behest.
  5. Information Governance Becomes a Viable Reality.  For several years there’s been an effort to combine the reactive (far right) side of the EDRM with the logically connected proactive (far left) side of the EDRM.  But now, a number of surveys have linked good information governance hygiene with better response times to eDiscovery requests and governmental inquires, as well as a corresponding lower chance of being sanctioned and the ability to turn over less responsive information.  In 2012, enterprises will realize that the litigation use case is just one way to leverage archival and eDiscovery tools, further accelerating adoption.
  6. Backup Tapes Will Be Increasingly Seen as a Liability.  Using backup tapes for disaster recovery/business continuity purposes remains a viable business strategy, although backing up to tape will become less prevalent as cloud backup increases.  However, if tapes are kept around longer than necessary (days versus months) then they become a ticking time bomb when a litigation or inquiry event crops up.
  7. International eDiscovery/eDisclosure Processes Will Continue to Mature. It’s easy to think of the US as dominating the eDiscovery landscape. While this is gospel for us here in the States, international markets are developing quickly and in many ways are ahead of the US, particularly with regulatory compliance-driven use cases, like the UK Bribery Act 2010.  This fact, coupled with the menagerie of international privacy laws, means we’ll be less Balkanized in our eDiscovery efforts moving forward since we do really need to be thinking and practicing globally.
  8. Email Becomes “So 2009” As Social Media Gains Traction. While email has been the eDiscovery darling for the past decade, it’s getting a little long in the tooth.  In the next year, new types of ESI (social media, structured data, loose files, cloud context, mobile device messages, etc.) will cause headaches for a number of enterprises that have been overly email-centric.  Already in 2011, organizations are finding that other sources of ESI like documents/files and structured data are rivaling email in importance for eDiscovery requests, and this trend shows no signs of abating, particularly for regulated industries. This heterogeneous mix of ESI will certainly result in challenges for many companies, with some unlucky ones getting sanctioned because they ignored these emerging data types.
  9. Cost Shifting Will Become More Prevalent – Impacting the “American Rule.” For ages, the American Rule held that producing parties had to pay for their production costs, with a few narrow exceptions.  Next year we’ll see even more courts award winning parties their eDiscovery costs under 28 U.S.C. §1920(4) and Rule 54(d)(1) FRCP. Courts are now beginning to consider the services of an eDiscovery vendor as “the 21st Century equivalent of making copies.”
  10. Risk Assessment Becomes a Critical Component of eDiscovery. Managing risk is a foundational underpinning for litigators generally, but its role in eDiscovery has been a bit obscure.  Now, with the tremendous statistical insights that are made possible by enabling software technologies, it will become increasingly important for counsel to manage risk by deciding what types of error/precision rates are possible.  This risk analysis is particularly critical for conducting any variety of technology assisted review process since precision, recall and f-measure statistics all require a delicate balance of risk and reward.

Accurately divining the future is difficult (some might say impossible), but in the electronic discovery arena many of these predictions can happen if enough practitioners decide they want them to happen.  So, the future is fortunately within reach.

What a Difference a Year (or Two) Makes in Electronic Discovery

Thursday, August 5th, 2010

August just wouldn’t be August without lazy days at the beach spent playing in the sand, frolicking in the surf, and immersing yourself in the LTN executive summary of the latest Socha-Gelbmann Electronic Discovery report (in this case, the hot-off-the-presses 2010 edition).

Even with the lure of the big waves beckoning you out into the water, if you follow electronic discovery you likely have a hard time pulling yourself away from the report, and this year is no exception. In fact, this year’s report is especially insightful, as George and Tom seem to have done a particularly impressive job of getting the pulse of not just what’s going on in the law firm and service provider parts of the market, but the enterprise as well.

This is a big change from just a couple of years ago. Go back and review the executive summary from 2008, and you’ll notice a very different feel to the findings. In 2008, much of the talk was around the dynamics of the service provider market, with relatively little discussion of trends related to the e-discovery process and technological innovation in the space. In 2008, it felt like e-discovery was something you had other people do for you: the word “consumer” appeared 12 times in the executive summary. In 2010, two short years later? Just five times. Why? The language may be telling. “Cost” appeared seven times in the 2008 report. In the 2010 report? 16… more than twice as often.

What seems to have happened is that the recession has been something of a refining fire for the electronic discovery market. In order to reduce costs and manage risks, enterprises are behaving much less like consumers and more like real customers with skin (and money) in the game. Not surprisingly, they’ve gotten extremely aggressive about bringing  innovative cost-containing measures to bear on the process. Socha and Gelbmann highlight three:

  • More targeted preservation and collection of ESI
  • More focused review and analysis of the data
  • More effective use of technology to speed up the efforts, improve quality, and reduce costs

This is great news for innovative software companies in the e-discovery space — and their customers. What one would expect to occur in a maturing market is that it would move from a period of rapid innovation to a lower-innovation, consolidation phase. However, that’s not the case here. While there is consolidation occurring,  what’s remarkable about e-discovery right now isn’t really all the acquisition press releases in your twitter feed (mainly from vendors saddled with prior-generation point solutions who are trying to acquire their way toward a complete offering). Rather, it’s how leading enterprises are increasingly seeking, and finding, cutting-edge solutions to solve cost, efficiency, and risk management problems associated with e-discovery that simply weren’t available prior to the meltdown.

As in-house legal and IT e-discovery spending starts to gain steam, look for enterprises purchasing in-house solutions to demand many of the innovations that have been developed over the last couple of years (most of which are highlighted by the Socha-Gelbmann survey):

  • Targeted collection: Products better able to strategically target the collection of ESI, rather than attempting to boil the ocean, are more suited to the mindset and approach of cost-conscious enterprises
  • Iterative discovery: Products that are able to provide “to the left” functionality while still providing enterprise-class, intuitive processing, analysis, review, and production functionality
  • Support for small and big cases: In discussing “small is the new big”, Socha and Gelbmann highlight how “the aggregate of small cases dwarfs the combined large cases.” Successful products must simultaneously handle high numbers of smaller cases while still scaling to the largest matters
  • Integrated analytics: Products must bring to bear powerful analytics across all stages of the e-discovery process, focused not just on document review, but also looking at aggregates of data from many different angles and allowing you to see the big picture across the entire case for effective information and cost management

Is the EDD space maturing? Yes, as Socha and Gelbmann rightfully point out. But it’s doing so in surprising, innovative ways that, when it’s all over, may well prove to be a silver lining to the cloud of challenges the industry has faced over the last two years.

Learn More On Electronic Discovery Litigation.

The Federal Rules of California

Thursday, September 17th, 2009

On of August 14, 2009, the California Judicial Counsel amended their Rules of Court to augment discussion of electronic discovery issues during the meet and confer process.

Rule of Court 3.724 was amended to require discussion of “Any issues relating to the discovery of electronically stored information” no later than 30 calendar days before the date set for the initial case management conference.  The broad language (i.e., “any”) was augmented by eight specific categories that must be expressly discussed:

(A) Issues relating to the preservation of discoverable electronically stored information;

(B) The form or forms in which information will be produced;

(C) The time within which the information will be produced;

(D) The scope of discovery of the information;

(E) The method for asserting or preserving claims of privilege or attorney work product, including whether such claims may be asserted after production;

(F) The method for asserting or preserving the confidentiality, privacy, trade secrets, or proprietary status of information relating to a party or person not a party to the civil proceedings;

(G) How the cost of production of electronically stored information is to be allocated among the parties;

(H) Any other issues relating to the discovery of electronically stored information, including developing a proposed plan relating to the discovery of the information;

Many of these issues track FRCP language (including forms of production, preservation, privilege issues, etc.).  However, section G seems somewhat novel given the historical “American Rule” where the producing party is required to bear all necessary costs of production.

Curiously missing, in comparison with FRCP 26 B(2)(b), is the need to discuss the handling of “inaccessible” ESI, although this could easily be subsumed in the “any other issues” language of section H.  Also missing is a discussion about proposed searching and/culling protocols (aka “keyword negotiations”) which are often part of the core meet and confer topics in Federal court.

Nevertheless, the scope is broad enough to require *a* discussion of all likely relevant electronic discovery issues, which was often lacking historically.  Once that discussion starts, reasonably savvy counsel should be able to flesh out most of the significant issues.  And, given this broad language a judge would presumably give them a hard time for any material omissions.

Learn More On: Frcp Electronic Discovery.

Top 5 Cases That Shaped Electronic Discovery in 2008

Friday, December 12th, 2008

Picking five out of the sea of electronic discovery cases isn’t as easy as it sounds.  Sure, a few, like our “Case of the Year” will be no-brainers, but others aren’t as clear cut.  And, they’re certainly open to debate.  But, in my humble opinion here’s THE list, counting down David Letterman style:

5) Mancia v. Mayflower Textile Servs. Co., 2008 WL 4595175 (D. Md. Oct. 15, 2008)

If there ever was an opinion written by a judge to make a larger societal point, Mancia was certainly it.  Judge Paul Grimm, who’ll appear on this list in another slot as well, has clearly taken the mantle from Judge Scheindlin as the leading electronic discovery jurist.  He’d heretofore authored a number of significant opinions in this area, including Hobson and Thompson. Now, in Mancia he used a garden variety discovery dispute, which was typically rife with boilerplate objections and other obstreperous tactics, to highlight the Sedona Conference’s Cooperation Proclamation.

The lasting takeaway from the opinion is the notion that “[c]ourts repeatedly have noted the need for attorneys to work cooperatively to conduct electronic discovery, and sanctioned lawyers and parties for failing to do so.” To support this notion he cites the Sedona Conference Proclamation and the little used FRCP 26(g).  This opinion is noteworthy because it gives precedent to bolster the Sedona initiative and should provide a ready citation for all those counsel who aren’t getting the level of cooperation they need from the opposition.  It remains to be seen if other judges will follow suit, but this could be the beachhead for a more cooperative electronic discovery process in 2009 and beyond.

4) Flagg v. City of Detroit, 252 F.R.D. 346 (E.D. Mich. 2008)

Flagg highlights the growing need to reconcile the electronic discovery landscape, which typically focuses somewhat myopically on email, with the larger informational trends which are now categorized by the use of blogs, social networking sites, instant messaging, and text messaging.  Flagg was one of the first to determine text messages (e.g., messages exchanged among certain officials and employees of the City of Detroit via city-issued text messaging devices) were discoverable under the standards of FRCP 26(b)(1).  The holding further demonstrated the challenges of conducting electronic discovery across information systems that mix personal information with business communications.  This type of information commingling will continue to escalate, causing significant long term electronic discovery challenges due to thorny privacy, privilege and policy implications.

3) Rhoads Indus., Inc. v. Bldg. Materials Corp. of Am., 2008 WL 4916026 (E.D. Pa. Nov. 14, 2008)

Rhoads is one of the first cases post Federal Rule of Evidence (FRE) 502, which recently created a national standard (versus the previous split in jurisdictions) and now states a “middle ground” for the determining of inadvertent disclosure during electronic discovery.  The key provision is (b)(2) which provides protection only if “the holder of the privilege or protection took reasonable steps to prevent disclosure.”  So, Rhoads took that “reasonableness” question head on in a scenario where the plaintiff Rhoads admittedly (yet inadvertently) produced over eight hundred privileged, electronic documents.  The decision is significant because it used the five-factor test stated in Fidelity, but put an undue weighting on the final test which was: “whether the overriding interests of justice would be served by relieving the party of its errors.”   This approach potentially threatens the development of sound case law that will be necessary to help the deployment of FRE 502 into practice because it casts too much uncertainty with its weighting of “fairness” (a problematically vague notion) in the analysis.  It will be interesting to see if/how this approach is subsequently adopted as we enter the New Year.

2) Qualcomm Inc. v. Broadcom Corp., 2008 WL 66932 (S.D. Cal. Jan. 7, 2008)

This for many was the case of the year given it’s far reaching implications for the legal community.  Some have argued that this isn’t an e-discovery abuse case per se, but more of an example of discovery abuses that just so happened to be centered around ESI.  In either case, the fraud, resulting cover-up, sanctions, ethical issues and privilege discussions made for insightful and thought provoking reading throughout 2008.  The lasting takeaway from Qualcomm appears to be the implications of not just committing discovery abuses, but the failure of having a well thought out e-discovery plan that is actively executed/monitored by outside counsel.  The resulting tension between outside counsel, inside counsel and the internal IT department may continue to escalate if more cases like this make the headlines in 2009.

1)  E-Discovery Case of the Year: Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008)

Judge Grimm’s hallmark opinion has had the legal community buzzing over the past several months and the reason appears pretty straight forward.  In Victor Stanley Grimm builds on the holdings in Seroquel, O’Keefe and Equity Analytics, to boldly cast doubt on a practice so routine that it’s literally shocked the legal community into reevaluation:

(“[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) . . . .”

The notion that electronic discovery search is beyond the ability of most attorneys has caused tremors within the litigation support community who had a long history of blindly receiving keywords from counsel, running them and turning back over the results – often blissfully unaware of the extent to which those keyword searches actually located relevant information.  Victor Stanley‘s analysis of the “reasonableness” of search protocols also has impact on the FRE 502 and therefore cements its place alongside other e-discovery “must reads” such as Zubulake and Morgan Stanley.

The cases above are my Top 5.  What additional cases do you think were important?  Please let me know by commenting on the cases you think shaped electronic discovery in 2008 and why.

Learn More On: Frcp Electronic discovery.

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in electronic discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

Demystifying Concept Search in Electronic Discovery

Tuesday, October 28th, 2008

Concept or content search continues to be a hot topic within the e-discovery community.  There’s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  Judge Facciola discusses concept search in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in e-discovery.   It’s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.

The “Concept Search is Concept Search” Myth

The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: concept search and concept categorization.  It’s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.

So, what are the differences between concept search and concept categorization?  First, let’s start with concept search.  Concept search technologies find documents containing “concepts”.  I think that the Sedona Conference’s “Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery“, provides a good definition of “concept” when used in a search context: “the combination of [a] query term and the additional terms identified by the thesaurus.”  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.

Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not “find” documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona commentary has a good summary of these different technologies if you are interested in learning more.

As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It’s why it’s not surprising that a lot of the users of e-discovery services and software don’t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I’ll discuss this in my next blog post.

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance