Posts Tagged ‘legal discovery’

Why Transparent Search In E-Discovery Is The Answer To Victor Stanley

Tuesday, August 26th, 2008

In my last post, I discussed how the “black box” design of enterprise search engines makes it challenging to defensibly use keyword search in e-discovery and follow Judge Grimm’s guidance in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  In Victor Stanley, Judge Grimm notes that because keyword search technology is prone to producing over- and under-inclusive results, attorneys using keyword search should adopt one of two approaches: either collaborate with the opposing party to agree on keyword search methodology, or utilize best practices that demonstrate they have taken reasonable measures to reduce over- and under-inclusiveness.  However, the black box search technologies that are used in e-discovery today make following this guidance difficult.  They can’t reduce under-inclusiveness without increasing over-inclusiveness.  And they make it expensive to utilize collaborative or best practices methodologies including testing, sampling, refining and documenting searches.  All of which begs an obvious question: what can be done to improve search for e-discovery?

In my opinion, the answer is simple: e-discovery search needs to become more transparent.  Instead of being forced to feed one search query at a time into a “black box” search engine and then getting results  with no idea how those results were generated, lawyers and litigation support professionals need technology that provides them with greater visibility into the search process. They need to understand how the results were obtained, so they can reduce both the over- and under-inclusiveness of keyword search, and easily follow Judge Grimm’s advice to improve the defensibility of their search methodology.

A transparent search solution should have four key elements:

  1. Transparent query expansionQuery expansion is the process by which search engines take the query that the user submitted and expand or convert it into a new and improved form.  Wildcard, stemming, concept and fuzzy searches all follow this query expansion process.  For example, the search “divers*,” would be expanded to search for all the words that start with “divers” in the data set, such as “diverse,” “diversity,” “diversion,” “diversification,” etc.  In transparent search, query expansion would be exposed to users, allowing them to include or exclude expanded keywords. To continue with the previous example, a user that is searching for documents related to diversity would then have the ability to exclude false positive expanded terms, such as “divers”, “diversion,” and “diversification” from the search.  Making query expansion transparent can significantly reduce the over-inclusiveness of keyword search.  It also makes it practical to use technologies, such as concept and fuzzy search, that have not been used to date because of their complexity and tendency to produce massively over-inclusive results.
  2. Multiple query support. When a search contains multiple keyword queries, such as “hiring” and “interview,” transparent search should provide visibility into the results for each individual query as well as the combination of all the queries. For example, with the search “hiring OR interview,” users should have separate visibility into the results for “hiring” and “interview” as well as “hiring OR interview.”  They should know that out of the 100 documents that match “hiring OR interview”, only 5 match interview and 95 match hiring.  This kind of visibility is critical if you want to either collaborate or follow search testing, sampling, and refinement best practices when there are a large number of queries.
  3. Rapid sampling. Transparent search should support the ability to rapidly sample the results from all of the individual queries, such as “hiring” and “interview”, contained within a search. It should also be easy to take a random sample of non-matching documents in order to assess whether one or more searches have identified as many of the relevant documents as possible.  As Judge Grimm states in Victor Stanley when assessing keyword searches used to find privileged documents, “The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive.”
  4. Automated documentation. Transparent search technology needs to document all aspects of the search process including (but not limited to) any keyword that has been excluded during transparent query expansion, the combined results of a search containing multiple individual queries, and the results for each of the individual queries within that search.  Automatically documenting the search methodology used and the results obtained is critical so that users can “show their work” if their search methodology is ever called into question.

Benefits of Transparent Search

By addressing the main technology challenges of keyword search, transparent search provides significant benefits to attorneys and litigation support professionals using search for e-discovery. First, parties that adopt transparent search can improve the defensibility of their e-discovery search practices. By enabling iterative testing, sampling and refinement, transparent search allows users to adopt the approaches recommended by Judge Grimm when it was previously impractical to do so.  At the end of the day, this means less risk.

Second, the use of transparent search can substantially reduce downstream production and review costs by removing false positives. For example, it is not uncommon for certain wildcard searches to generate results where 20-40% of the included documents are false positives that can be removed by transparent query expansion.  This can result in thousands of dollars of savings on a single search query.

Finally, transparent search can dramatically reduce the time and cost required to complete the search and culling stage of e-discovery. Currently, it can take hundreds of hours to run a significant number of searches one at a time, document the results of each search, and sample and refine each individual query. With transparent search, running multiple queries and documenting each of the individual results takes minutes. Sampling each of the individual queries takes seconds.

When it comes to e-discovery search, it’s important to recognize that there are no “silver bullets.”  Search will remain an imperfect science with the possibility of over- and under-inclusive results.  But equally, there is no doubt that search remains the best solution for reducing the vast quantities of electronic information that are a part of every e-discovery process down to a reasonable level for human review. While attorneys and litigation support professionals can’t completely remove the imperfections of keyword search, they can, with transparent search, take action to minimize the impact of these imperfections and defensibly meet the requirements of new case law.  In doing so, they will be able to turn their attention to where it should be: the substance of the case.

Five E-Discovery Questions with Craig Ball

Tuesday, August 12th, 2008

cball1.gifIn the spirit of the popular New York Times magazine feature, with this post we inaugurate what we hope to be a long-running series of interviews with e-discovery luminaries to get their take on emerging ideas and trends (and hopefully have some fun as well).

Today’s questionee is e-discovery and forensics expert (and popular Law Technology News columnist) Craig Ball.  Craig’s combination of wit and insight speaks for itself, so let’s just get right to the questions.

1) The cases that are on everyone’s mind are O’Keefe/Lundin and Victor Stanley. What’s the practical impact of these rulings to the e-discovery practitioner?

Certainly these decisions have captured my enthusiastic attention.  Lawyers now have to devote greater care and thought to electronic search, and wake to the empirical evidence establishing the shocking shortfalls of keyword search in unstructured ESI collections.  The days of “let’s try these search terms and see what happens” are numbered.  Queries that will be run across mushrooming collections must pass muster in terms of noisiness, ambiguity, potential for misspelling, affinity to stemming, synonyms, slang, acronyms, IM-speak and other criteria unfamiliar to a profession that prides itself on precise expression.  Lawyers need to embrace concepts of “precision,” “recall” and “sampling” with the same fervor we once brought to the Statute of Frauds and the Rule Against Perpetuities.

Currently, lawyers on both the north and south sides of the docket are the unjust beneficiaries of slipshod search.  Requesting parties benefit from the economic leverage attendant to costly-yet-unavailing fishing expeditions while counsel for producing parties mint obscene pyramidal profits reviewing mountains of electrochaff.  Despite all the vitriol, rarely does either side’s counsel set out to exploit flawed searches.  It’s mostly blissful ignorance at work, coupled with little incentive to fix what’s broken.  Accordingly, Judges like Facciola and Grimm are picking up the baton and running with it.  It’ll be a long, tough race—and not every jurist will head for the tape—but I applaud those who’ve left the blocks!

Search demands nuance, discipline and scientific method.  Prepare to routinely test queries against sample collections, as soon that practice will be as commonplace as DNA testing in paternity cases.

2) What can e-discovery technology providers do to help?

At the risk of appearing ungracious, I can’t help but note that vendors eat at the same gluttonous table as lawyers, and vendor marketing is often so much snake oil.  Until the EDD vendor community takes a longer view of the market, stops building businesses for acquisition and starts building them to last, I don’t think they can be of much help.  The industry should stop pretending their processes and software are “proprietary” and touting their secret sauces.  Instead, how about delivering consistent, predictable service and pricing delivered by experienced, reliable and unflinchingly honest, genuinely knowledgeable personnel who welcome the chance to help lawyers understand this stuff.  If employees stayed around more than six months, that would be nice, too.

3) You recently participated in a new track at LegalTech West called FutureTech.  For those who missed it or the follow-up podcasts, what’s an emerging e-discovery trend that you think might take people by surprise?

Several come to mind.  Mediated meet-and-confer, for example.  The cost of a failed EDD effort can dwarf the amount in controversy, so it makes sense to turn to neutral, technically adept intermediaries to help resolve nettlesome questions, of scope, search, forms of production and cost sharing.  Folks just behave better when company comes.  I also foresee divergence between discovery and the other traditional phases of litigation.  We may see entirely different teams handle discovery in a zealous but non-confrontational manner, leaving the scorched earth stuff to others.

Another development that will sneak up on most lawyers is the growing marginalization of text.  As natural interfaces emerge—where you will talk or gesture to your computers—and as communication gets more real time and visual, words will manifest conduct less frequently.  Take YouTube.  I don’t get it—to me, it’s silly and boring—but it’s rich and exciting to my kids…and text is tertiary.

Something else that will change is where we look for evidence.  If you were pursuing discovery against a teenager, where would you go to locate their most revealing ESI?   Social networking (virtualized storage)?   Cell phones and laptops (portable devices)?   Gaming devices (alternate platforms)?  In ten years, don’t imagine they won’t favor and extend the tools they grew up with.

Data is the ultimate portable commodity, so it’s odd we don’t take our computing environments with us. We will. If desktop machines survive, they will be little more than screens with network connectivity temporarily hosting the virtual identities we carry in our pockets or store online. Local hard drives will be an increasingly irrelevant place to search for files as EDD turns to personal storage devices and online storage.

Other trends lawyers may not foresee: People will retain much more data as there will be little incentive and less time to make it go away. “Cheaper to keep her” will be how most of us deal with data.  Location data will be routinely tracked by many devices with GPS functionality on and about our person, so this will become a new and useful evidence stream.  Virtual machines will be used as forms of production.  Local storage will give way to cloud storage.  Hey, I could do this one all day!

4) You have an extensive background in both e-discovery and computer forensics. Do you see a convergence, or will they remain largely separate worlds from a process and technology perspective?

I see convergence already.  “Forensically sound” practices are creeping into EDD harvest and traditionally rigid approaches to disk forensics are being challenged by the practical realities of immense volume and mission-critical operations.   We see the growth of “live” forensics, hash values displacing Bates numbers and operating systems allowing more and more deleted information to be easily resurrected.

The tools and techniques of each discipline are also converging.  But there will remain a distinction between the two flowing from the unique ability of a skilled forensics examiner to distill the bits and bytes into a compelling tale of human strength or frailty.  It’s painfully easy to misread the significance of digital footprints.  There’s a component of science and art to computer forensics that will insure its distinction and growth.

We face convergent challenges, too.  In both forensics and EDD, the lure of lucre pulls in people who really ought to be doing something less harmful.  Lives, liberty, fortunes, and careers hinge on some computer forensic examinations; yet, some schools and tool sellers promote the notion that you can learn what you need to know over a long weekend.  Just as many copy shops decided they were e-discovery experts one dark night, a lot of poorly trained, incurious and careless forensic examiners are popping up all over.  I’m frankly appalled by some of what I see out there.   Where I hope we ultimately converge is a high standard of professionalism and proven expertise.

5) Finally, the question on the mind of every loyal “Ball in Your Court” reader: Which court is it — basketball, tennis, or volleyball?

I’ve never been much for team sports, but if I have to choose, I opt for the one played on the beach by fit, bikini-clad women.  I may be a hopeless nerd, but I’m not stupid.

Socha-Gelbmann Survey For 2008 Highlights Shifting Landscape In E-Discovery Software

Thursday, July 24th, 2008

Yesterday, George Socha and Tom Gelbmann published summary results for their 2008 EDD survey. George and Tom gathered self-reported data from 85 e-discovery service providers and 40 e-discovery software companies. To help vendors resist the temptation to “exaggerate” their accomplishments, they then cross-referenced the responses against independent surveys submitted by 29 law firms and 19 corporations, and applied a healthy dose of their own good judgment. The outcome, which they will publish in-full next month, is a great snapshot of the industry, and probably the most objective ranking of e-discovery vendors that you can find.

By comparing this year’s results to the 2007 survey, you get a sense for how much has changed in the e-discovery world over the past 12 months:

Top E-Discovery Software Companies

software.jpg

Note: arrows show change to rankings from last year’s Socha-Gelbmann Survey

Autonomy and Clearwell move up to the Top 5, overtaking Attenex and CT Summation which slip back to the second tier. There are also 3 new names ranked 6 through 10 (Epiq, iConect and Symantec) who displace Cataphora, Doculex, ISYS, and Oracle, none of whom even make it into the top 15. In other words, 70% of the rankings have changed since last year.

If a litigation support manager were to focus only on the Top 5 in making her e-discovery software decision, she would have a choice of some very different solutions. Autonomy positions itself as a high-end (expensive) platform for corporations, while Lexis offers a comprehensive toolset for law firms. Guidance and Clearwell are complementary in that both provide best-of-breed solutions for parts of the EDRM model: Guidance is the leader in collection and preservation, while Clearwell is the leader in processing, analysis and review. Finally, FTI takes a services-based approach which centers around RingTail, its hosted review application.

Looking lower down the list, there were some other interesting results, primarily around which companies were NOT ranked. Kazeon made it into the third tier (ranked 11-15) whereas StoredIQ, its main competitor, did not. Nor did Recommind break into the rankings, despite making a major push into e-discovery from knowledge management over the past year. But the most striking absentees are PSS Systems and Exterro, which have pioneered litigation hold management for Fortune 100 companies. I can only guess that they cover too much of niche market to warrant inclusion in an industry-wide report.

Top E-Discovery Service Providers

In contrast to the world of software, e-discovery services saw much less movement in this year’s rankings:

service-providers.jpg

Note: arrows show change to rankings from last year’s Socha-Gelbmann Survey

There was only one change to the top 5: Fios moved up, displacing Guidance which plummeted 10-20 places down to a 16-25 ranking. In addition, there were two new players in the top 10, Epiq and Huron, who edged out Electronic Evidence Discovery and Ernst & Young.

Conclusion

Changes to the software rankings reflect broader changes in the e-discovery market. As e-discovery has moved in-house, corporations have become a major driver of purchase decisions that were previously left to law firms. Many software companies, such as Attenex, have struggled to make this transition, while others, such as Clearwell, have capitalized on it. There has been no such change in the service provider world and, as a result, the rankings are relatively stable.

It will be interesting to see what happens next year. Every other software space is dominated by a small number of players, like Oracle for databases or VMWare for virtualization. If the same is true for e-discovery, then we can expect many fewer changes to the software rankings in future surveys as the leaders pull away from the pack.

Review-less E-Discovery Review

Monday, July 21st, 2008

terminator.jpgMost science fiction visions of the distant future seem to contain a rather singular fear: that the human race will be taken over by computers.  Think “Terminator” series, preferably without the naked Arnold Schwarzenegger visual.  Regardless of whether this vision fills you with trepidation or excitement there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.

For past several decades, attorneys have had to manually review information for relevancy and privilege in response to the e-discovery process.  Quoting from Information Inflation: Can the Legal System Adapt? by George Paul and Jason Baron, this task has always been viewed as sacrosanct “because of ‘death penalty’ waiver doctrine that evolved long ago when information was still manageable.”

Like so many industries, the legal profession has attempted to grapple with the transformation that the digital revolution has brought to the forefront.  The latest revisions to the Federal Rule of Civil Procedure (FRCP) is the most obvious case in point.  And yet, electronically stored information (ESI) is proving difficult to fit into traditional, even remodeled, paradigms.  Even ignoring (for the moment) the proliferation of novel data types (i.e., blog content, voice over IP or VOIP, webmail, text messaging, web services, etc.) the amount of data that attorneys are being required to review has reached a tipping point of review feasibility.

Back in the day, information was viewed in terms banker boxes of information, and even in the most document intensive discovery matters this measuring stick belied the belief that armies of attorneys could conceivably conquer the massive document review problem.  But now, we often see clients that process routine matters containing terabytes of information.  Most of us in the e-discovery space have become numbed to the abstract nomenclature of megabytes, gigabytes, terabytesi, petabytesii, and in the process we may have failed to realize that we have moved well beyond the scale of information that can be reasonably attacked with even the largest armada of contract attorneys (assuming that the client could conceivably bear the astronomical costs).

“At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later.”iii

I’m certainly not the first to point out that this tipping point is coming, but now we are really starting to see early adopters respond to this sea change. In their linked article above, George Paul and Jason Baron state “It is no exaggeration to say that litigation, as we have known it, is threatened by information’s new hyper-flow. The amount of electronically stored information relevant to a case is already a stress point in litigation.  […]  Litigators can no longer depend on manual review alone….”

Up until now, attorneys and the clients that are footing the bill have had to make a Hobson’s choice:  either “force parties to continue hugely expensive privilege reviews, or to forego the attorney-client privilege or work-product privilege altogether.”   But, now it appears that another way is evolving.

The following lays out a scenario where a non-manual review methodology may make sense.  ***Please note: this approach is not without risk.  At this moment in time neither clawback provisions, the potential adoption of Evidence Rule 502 nor any other know prophylactic measure can completely insulate a producing party from the unforeseen consequences of an inadvertent disclosure.  But, as they say, desperate times call for desperate measures….

Step one: Evaluate the Environment

The following factors represent some of the elements that should be taken into consideration prior to skipping the normal, human based review steps that are seen in most e-discovery matters.

  1. Large data set.  This may sound a bit obvious, but a non-manual approach is best suited for large, unwieldy data sets.  The corpus doesn’t need to be in the terabytes, but the data set should be evaluated in term of discovery processing costs and attorney review estimates.
  2. Short Production Timelines.  Once the above calculations are conducted, the next step is to determine if a human based review could even conceivably be conducted in the given time frame.  In many instances, an eyes-on review process just won’t be feasible since there won’t be enough bodies to throw at the problem.
  3. Next Gen “PAR” Tools.  In order to pull this “review-less” review process off, both safely and quickly, the responding party needs to have access to fast, robust processing, analysis and review (“PAR”) tools.  Certainly, it’s possible to have this scenario work with an e-discovery service provider, if they have the capability.
  4. Relatively Small Amount in Controversy.  For the time being, this approach should not be considered for any “bet the company” litigation, nor anything with significant downside risk (governmental inquiries, punitive damages, class actions, 2nd requests, etc.).  Yet, for many standard commercial lawsuits, corporate investigations, HR claims, etc. this review-less approach may be worth considering.
  5. Ability to Use a Clawback Provision.  Entering into a clawback provision with the opposition is mandatory in this methodology since the chances of an inadvertent production are statistically ever-present.  Yet, until Evidence Rule 502 is resolved, there will always be a risk that the clawback won’t be enforceable against 3rd parties.
  6. Non-governmental Production.  Most information in governmental productions becomes part of the public record, meaning that a clawback isn’t going to be feasible.  Here, trade secret information, personally identifiably data and the like would be disastrous if pushed out into the public domain.

Step two: Perform a Risk/Benefit Analysis

Next, take all the above factors into consideration and determine if the risks (of inadvertent production, the clawback being ineffective, etc.) are worth the benefits (reduced costs, lower attorney review fees, ability to meet deadlines, etc.).

Sure this is hard work, but the alternative (manual review) is more ephemeral than realistic.

[In my next post, I’ll address the tactical steps to conduct a review-less review process.  Stay tuned……]

i One terabyte is generally estimated to contain 75 million pages and could conceivably cost $18,750,000 to review.  Anne Kershaw, Automated Document Review Proves Its Reliability, 5 DIGITAL DISCOVERY & E-EVIDENCE 11 (2005).

ii According to Wired, we’re now in the “Petabyte Age” where that amount of data is processed by Google’s servers every 72 minutes.

iii Wired article, above.

Live from LegalTech West: The E-Discovery Tug of War

Friday, June 27th, 2008

tug_of_war_2.jpgHello from Los Angeles, where the weather’s fine and summer’s in full swing! Accordingly, a few of us in the legal technology community spent the night before LegalTech enjoying a Dodger’s game hosted by LTN editor-in-chief and rabid Yankees fan Monica Bay (outfitted in full Yankee regalia for the occasion). So as to not incur Monica’s wrath, I left my Red Sox cap at home.

At the game, I happened to sit next to a colleague from another vendor who mentioned that her firm is about to celebrate twenty years in e-discovery.

Twenty years! What a remarkable milestone for any company. It got me wondering about how much technology has evolved over that time period, and raised an interesting question to noodle over between innings: With all of the investment and innovation in the e-discovery space, who’s actually winning the e-discovery tug of war, twenty years in?

What is the e-discovery tug of war, you ask? Let’s start with the scene in 1988.

On one side, the documents: They stared at you from across the mud puddle — hundreds or even thousands of boxes stacked one of top of another, hauled out from a warehouse where they’d spent their days, against their will, in windowless solitude, ready for battle. They were ticked.

And on the other side, you: With your new IBM PS/2 Model 80 (the best money could buy: 640×480 VGA color screen, 16mhz 386 processor, 80MB hard drive), flatbed scanner, and some new DOS-based database program called “Concordance.” To add insult to injury, Starbucks hadn’t even really gone national yet, so you were probably stuck with a jar of instant coffee to try to stay awake.

You didn’t stand a chance.

From then until now, two different dynamics have played against each other, pulling the flag back and forth over the dividing line:

  1. On one side, the explosive growth of electronic documents has been truly mind-boggling. From a baseline of close to zero in 1988 (WordPerfect 5.1 wasn’t introduced until 1989), today essentially every single business document is created, transmitted, and stored electronically.
  2. On the other side, technology innovators in the e-discovery space have used creativity and a large dose of Moore’s Law to store, process, and search electronic documents with ever-increasing speed and efficiency.

During the seventh inning stretch, with the Dodgers holding a commanding lead over the White Sox, I thought: Maybe technology is about to win.

Here’s the argument: Assuming that the creation of document content will still largely be human-driven, now that most every legally significant class of communication is being created and managed on-line, growth of e-discovery-relevant data volumes may quickly move from being exponential (when everything was “going digital”) to a rate driven more by productivity improvements and economic growth. Improvements in processing, search, and analysis of documents, however, will continue to improve at a Moore’s Law pace for the foreseeable future, presumably making it fairly trivial for advanced e-discovery technologies to outmuscle their longtime adversary.

Google shows some evidence of this victory of technology over data. Remember that just a few years back, search engines frequently trumpeted how much of the Internet they were able to index – and it was far from the whole thing. Today, that’s largely a solved problem. It’s simply amazing how quickly Google’s index ingests new data, often in what seems like a matter of minutes. In fact, I dare say that by the time you read this post, you’ll be able to perform a Google search on some of its content and have it come up front-and-center in your search results. Amazing.

What does this mean for e-discovery? The best e-discovery technologies will change to solve challenges that are far more strategic in nature. Instead of focusing on how fast and effectively they can process documents, or how quickly they can allow attorneys to review them, they’ll provide powerful capabilities for addressing some of the most important e-discovery problems that inside and outside counsel face, such as:

  • How do I craft robust, defensible search strategies for my cases while minimizing e-discovery costs?
  • How can I standardize a repeatable, high-quality discovery process that’s executed consistently across my organization?
  • How can my organization become more proactive in identifying potential legal risks and liabilities based on our company’s “legal history”?

I’m sure you can come up with a number of others. What do you think – is the war against documents over, and e-discovery ready to move to a new phase? Or are there still many more battles to be fought?