Archive for the ‘e-discovery workflow’ Category

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.

Losing Weight, Developing an Information Governance Plan, and Other New Year’s Resolutions

Tuesday, January 17th, 2012

It’s already a few weeks into the new year and it’s easy to spot the big lines at the gym, folks working on fad diets and many swearing off any number of vices.  Sadly perhaps, most popular resolutions don’t even really change year after year.  In the corporate world, though, it’s not good enough to simply recycle resolutions every year since there’s a lot more at stake, often with employee’s bonuses and jobs hanging in the balance.

It’s not too late to make information governance part of the corporate 2012 resolution list.  The reason is pretty simple – most companies need to get out of the reactive firefighting of eDiscovery given the risks of sloppy work, inadvertent productions and looming sanctions.  Yet, so many are caught up in the fog of eDiscovery war that they’ve failed to see the nexus between the upstream, proactive good data management hygiene and the downstream eDiscovery chaos.

In many cases the root cause is the disconnect between differing functional groups (Legal, IT, Information Security, Records Management, etc.).  This is where the emerging umbrella concept of Information Governance comes to play, serving as a way to tackle these information risks along a unified front. Gartner defines information governanceas the:

“specification of decision rights, and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archiving and deletion of information, … [including] the processes, roles, standards, and metrics that ensure the effective and efficient use of information to enable an organization to achieve its goals.”

Perhaps more simply put, what were once a number of distinct disciplines—records management, data privacy, information security and eDiscovery—are rapidly coming together in ways that are important to those concerned with mitigating and managing information risk. This new information governance landscape is comprised of a number of formerly discrete categories:

  • Regulatory Risks – Whether an organization is in a heavily regulated vertical or not, there are a host of regulations that an organization must navigate to successfully stay in compliance.  In the United States these include a range of disparate regimes, including the Sarbanes-Oxley Act, HIPPA, the Securities and Exchange Act, the Foreign Corrupt Practices Act (FCPA) and other specialized regulations – any number of which require information to be kept in a prescribed fashion, for specified periods of time.  Failure to turn over information when requested by regulators can have dramatic financial consequences, as well as negative impacts to an organization’s reputation.
  • Discovery Risks – Under the discovery realm there are any number of potential risks as a company moves along the EDRM spectrum (i.e., Identification, Preservation, Collection, Processing, Analysis, Review and Production), but the most lethal risk is typically associated with spoliation sanctions that arise from the failure to adequately preserve electronically stored information (ESI).  There have been literally hundreds of cases where both plaintiffs and defendants have been caught in the judicial crosshairs, resulting in penalties ranging from outright case dismissal to monetary sanctions in the millions of dollars, simply for failing to preserve data properly.  It is in this discovery arena that the failure to dispose of corporate information, where possible, rears its ugly head since the eDiscovery burden is commensurate with the amount of data that needs to be preserved, processed and reviewed.  Some statistics show that it can cost as much as $5 per document just to have an attorney privilege review performed.  And, with every gigabyte containing upwards of 75,000 pages, it is easy to see massive discovery liability when an organization has terabytes and even petabytes of extraneous data lying around.
  • Privacy Risks – Even though the US has a relatively lax information privacy climate there are any number of laws that require companies to notify customers if their personally identifiable information (PII) such as credit card, social security, or credit numbers have been compromised.  For example, California’s data breach notification law (SB1386) mandates that all subject companies must provide notification if there is a security breach to the electronic database containing PII of any California resident.  It is easy to see how unmanaged PII can increase corporate risk, especially as data moves beyond US borders to the international stage where privacy regimes are much more staunch.
  • Information Security Risks Data breaches have become so commonplace that the loss/theft of intellectual property has become an issue for every company, small and large, both domestically and internationally.  The cost to businesses of unintentionally exposing corporate information climbed 7 percent last year to over $7 million per incident.  Recently senators asked the SEC to “issue guidance regarding disclosure of information security risk, including material network breaches” since “securities law obligates the disclosure of any material network breach, including breaches involving sensitive corporate information that could be used by an adversary to gain competitive advantage in the marketplace, affect corporate earnings, and potentially reduce market share.”  The senators cited a 2009 survey that concluded that 38% of Fortune 500 companies made a “significant oversight” by not mentioning data security exposures in their public filings.

Information governance as an umbrella concept helps organizations to create better alignment between functional groups as they attempt to solve these complex and interrelated data risk challenges.  This coordination is even more critical given the way that corporate data is proliferating and migrating beyond the firewall.  With even more data located in the cloud and on mobile devices a key mandate is managing data in all types of form factors. A great first step is to determine ownership of a consolidated information governance approach where the owner can:

  • Get C-Level buy-in
  • Have the organizational savvy to obtain budget
  • Be able to define “reasonable” information governance efforts, which requires both legal and IT input
  • Have strong leadership and consensus building skills, because all stakeholders need to be on the same page
  • Understand the nuances of their business, since an overly rigid process will cause employees to work around the policies and procedures

Next, tap into and then leverage IT or information security budgets for archiving, compliance and storage.  In most progressive organizations there are likely ongoing projects that can be successfully massaged into a larger information governance play.  A great place to focus on initially is information archiving, since this one of the simplest steps an organization can take to improve their information governance hygiene.  With an archive organizations can systematically index, classify and retain information and thus establish a proactive approach to data management.  It’s this ability to apply retention and (most importantly) expiration policies that allows organizations to start reducing the upstream data deluge that will inevitably impact downstream eDiscovery processes.

Once an archive is in place, the next logical step is to couple a scalable, reactive eDiscovery process with the upstream data sources, which will axiomatically include email, but increasingly should encompass cloud content, social media, unstructured data, etc.  It is important to make sure  that a given  archive has been tested to ensure compatibility with the chosen eDiscovery application to guarantee that it can collect content at scale in the same manner used to collect from other data sources.  Overlaying both of these foundational pieces should be the ability to place content on legal hold, whether that content exists in the archive or not.

As we enter 2012, there is no doubt that information governance should be an element in building an enterprise’s information architecture.  And, different from fleeting weight loss resolutions, savvy organizations should vow to get ahead of the burgeoning categories of information risk by fully embracing their commitment to integrated information governance.  And yet, this resolution doesn’t need to encompass every possible element of information governance.  Instead, it’s best to put foundational pieces into place and then build the rest of the infrastructure in methodical and modular fashion.

Lessons Learned for 2012: Spotlighting the Top eDiscovery Cases from 2011

Tuesday, January 3rd, 2012

The New Year has now dawned and with it, the certainty that 2012 will bring new developments to the world of eDiscovery.  Last month, we spotlighted some eDiscovery trends for 2012 that we feel certain will occur in the near term.  To understand how these trends will play out, it is instructive to review some of the top eDiscovery cases from 2011.  These decisions provide a roadmap of best practices that the courts promulgated last year.  They also spotlight the expectations that courts will likely have for organizations in 2012 and beyond.

Issuing a Timely and Comprehensive Litigation Hold

Case: E.I. du Pont de Nemours v. Kolon Industries (E.D. Va. July 21, 2011)

Summary: The court issued a stiff rebuke against defendant Kolon Industries for failing to issue a timely and proper litigation hold.  That rebuke came in the form of an instruction to the jury that Kolon executives and employees destroyed key evidence after the company’s preservation duty was triggered.  The jury responded by returning a stunning $919 million verdict for DuPont.

The spoliation at issue occurred when several Kolon executives and employees deleted thousands emails and other records relevant to DuPont’s trade secret claims.  The court laid the blame for this destruction on the company’s attorneys and executives, reasoning they could have prevented the spoliation through an effective litigation hold process.  At issue were three hold notices circulated to the key players and data sources.  The notices were all deficient in some manner.  They were either too limited in their distribution, ineffective since they were prepared in English for Korean-speaking employees, or too late to prevent or otherwise ameliorate the spoliation.

The Lessons for 2012: The DuPont case underscores the importance of issuing a timely and comprehensive litigation hold notice.  As DuPont teaches, organizations should identify what key players and data sources may have relevant information.  A comprehensive notice should then be prepared to communicate the precise hold instructions in an intelligible fashion.  Finally, the hold should be circulated immediately to prevent data loss.

Organizations should also consider deploying the latest technologies to help effectuate this process.  This includes an eDiscovery platform that enables automated legal hold acknowledgements.  Such technology will allow custodians to be promptly and properly apprised of litigation and thereby retain information that might otherwise have been discarded.

Another Must-Read Case: Haraburda v. Arcelor Mittal U.S.A., Inc. (D. Ind. June 28, 2011)

Suspending Document Retention Policies

Case: Viramontes v. U.S. Bancorp (N.D. Ill. Jan. 27, 2011)

Summary: The defendant bank defeated a sanctions motion because it modified aspects of its email retention policy once it was aware litigation was reasonably foreseeable.  The bank implemented a retention policy that kept emails for 90 days, after which the emails were overwritten and destroyed.  The bank also promulgated a course of action whereby the retention policy would be promptly suspended on the occurrence of litigation or other triggering event.  This way, the bank could establish the reasonableness of its policy in litigation.  Because the bank followed that procedure in good faith, it was protected from court sanctions under the Federal Rules of Civil Procedure 37(e) “safe harbor.”

The Lesson for 2012: As Viramontes shows, an organization can be prepared for eDiscovery disputes by timely suspending aspects of its document retention policies.  By modifying retention policies when so required, an organization can develop a defensible retention procedure and be protected from court sanctions under Rule 37(e).

Coupling those procedures with archiving software will only enhance an organization’s eDiscovery preparations.  Effective archiving software will have a litigation hold mechanism, which enables an organization to suspend automated retention rules.  This will better ensure that data subject to a preservation duty is actually retained.

Another Must-Read Case: Micron Technology, Inc. v. Rambus Inc., 645 F.3d 1311 (Fed. Cir. 2011)

Managing the Document Collection Process

Case: Northington v. H & M International (N.D.Ill. Jan. 12, 2011)

Summary: The court issued an adverse inference jury instruction against a company that destroyed relevant emails and other data.  The spoliation occurred in large part because legal and IT were not involved in the collection process.  For example, counsel was not actively engaged in the critical steps of preservation, identification or collection of electronically stored information (ESI).  Nor was IT brought into the picture until 15 months after the preservation duty was triggered. By that time, rank and file employees – some of whom were accused by the plaintiff of harassment – stepped into this vacuum and conducted the collection process without meaningful oversight.  Predictably, key documents were never found and the court had little choice but to promise to inform the jury that the company destroyed evidence.

The Lesson for 2012: An organization does not have to suffer the same fate as the company in the Northington case.  It can take charge of its data during litigation through cooperative governance between legal and IT.  After issuing a timely and effective litigation hold, legal should typically involve IT in the collection process.  Legal should rely on IT to help identify all data sources – servers, systems and custodians – that likely contain relevant information.  IT will also be instrumental in preserving and collecting that data for subsequent review and analysis by legal.  By working together in a top-down fashion, organizations can better ensure that their eDiscovery process is defensible and not fatally flawed.

Another Must-Read Case: Green v. Blitz U.S.A., Inc. (E.D. Tex. Mar. 1, 2011)

Using Proportionality to Dictate the Scope of Permissible Discovery

Case: DCG Systems v. Checkpoint Technologies (N.D. Ca. Nov. 2, 2011)

The court adopted the new Model Order on E-Discovery in Patent Cases recently promulgated by the U.S. Court of Appeals for the Federal Circuit.  The model order incorporates principles of proportionality to reduce the production of email in patent litigation.  In adopting the order, the court explained that email productions should be scaled back since email is infrequently introduced as evidence at trial.  As a result, email production requests will be restricted to five search terms and may only span a defined set of five custodians.  Furthermore, email discovery in DCG Systems will wait until after the parties complete discovery on the “core documentation” concerning the patent, the accused product and prior art.

The Lesson for 2012: Courts seem to be slowly moving toward a system that incorporates proportionality as the touchstone for eDiscovery.  This is occurring beyond the field of patent litigation, as evidenced by other recent cases.  Even the State of Utah has gotten in on the act, revising its version of Rule 26 to require that all discovery meet the standards of proportionality.  While there are undoubtedly deviations from this trend (e.g., Pippins v. KPMG (S.D.N.Y. Oct. 7, 2011)), the clear lesson is that discovery should comply with the cost cutting mandate of Federal Rule 1.

Another Must-Read Case: Omni Laboratories Inc. v. Eden Energy Ltd [2011] EWHC 2169 (TCC) (29 July 2011)

Leveraging eDiscovery Technologies for Search and Review

Case: Oracle America v. Google (N.D. Ca. Oct. 20, 2011)

The court ordered Google to produce an email that it previously withheld on attorney client privilege grounds.  While the email’s focus on business negotiations vitiated Google’s claim of privilege, that claim was also undermined by Google’s production of eight earlier drafts of the email.  The drafts were produced because they did not contain addressees or the heading “attorney client privilege,” which the sender later inserted into the final email draft.  Because those details were absent from the earlier drafts, Google’s “electronic scanning mechanisms did not catch those drafts before production.”

The Lesson for 2012: Organizations need to leverage next generation, robust technology to support the document production process in discovery.  Tools such as email analytical software, which can isolate drafts and offer to remove them from production, are needed to address complex production issues.  Other technological capabilities, such as Near Duplicate Identification, can also help identify draft materials and marry them up with finals that have been marked as privileged.  Last but not least, technology assisted review has the potential of enabling one lawyer to efficiently complete the work that previously took thousands of hours.  Finding the budget and doing the research to obtain the right tools for the enterprise should be a priority for organizations in 2012.

Another Must-Read Case: J-M Manufacturing v. McDermott, Will & Emery (CA Super. Jun. 2, 2011)

Conclusion

There were any number of other significant cases from 2011 that could have made this list.  We invite you to share your favorites in the comments section or contact us directly with your feedback.

For more on the cases discussed above, watch this video:

Top Ten eDiscovery Predictions for 2012

Thursday, December 8th, 2011

As 2011 comes quickly to a close we’ve attempted, as in years past, to do our best Carnac impersonation and divine the future of eDiscovery.  Some of these predictions may happen more quickly than others, but it’s our sense that all will come to pass in the near future – it’s just a matter of timing.

  1. Technology Assisted Review (TAR) Gains Speed.  The area of Technology Assisted Review is very exciting since there are a host of emerging technologies that can help make the review process more efficient, ranging from email threading, concept search, clustering, predictive coding and the like.  There are two fundamental challenges however.  First, the technology doesn’t work in a vacuum, meaning that the workflows need to be properly designed and the users need to make accurate decisions because those judgment calls often are then magnified by the application.  Next, the defensibility of the given approach needs to be well vetted.  While it’s likely not necessary (or practical) to expect a judge to mandate the use of a specific technological approach, it is important for the applied technologies to be reasonable, transparent and auditable since the worst possible outcome would be to have a technology challenged and then find the producing party unable to adequately explain their methodology.
  2. The Custodian-Based Collection Model Comes Under Stress. Ever since the days of Zubulake, litigants have focused on “key players” as a proxy for finding relevant information during the eDiscovery process.  Early on, this model worked particularly well in an email-centric environment.  But, as discovery from cloud sources, collaborative worksites (like SharePoint) and other unstructured data repositories continues to become increasingly mainstream, the custodian-oriented collection model will become rapidly outmoded because it will fail to take into account topically-oriented searches.  This trend will be further amplified by the bench’s increasing distrust of manual, custodian-based data collection practices and the presence of better automated search methods, which are particularly valuable for certain types of litigation (e.g., patent disputes, product liability cases).
  3. The FRCP Amendment Debate Will Rage On – Unfortunately Without Much Near Term Progress. While it is clear that the eDiscovery preservation duty has become a more complex and risk laden process, it’s not clear that this “pain” is causally related to the FRCP.  In the notes from the Dallas mini-conference, a pending Sedona survey was quoted referencing the fact that preservation challenges were increasing dramatically.  Yet, there isn’t a consensus viewpoint regarding which changes, if any, would help improve the murky problem.  In the near term this means that organizations with significant preservation pains will need to better utilize the rules that are on the books and deploy enabling technologies where possible.
  4. Data Hoarding Increasingly Goes Out of Fashion. The war cry of many IT professionals that “storage is cheap” is starting to fall on deaf ears.  Organizations are realizing that the cost of storing information is just the tip of the iceberg when it comes to the litigation risk of having terabytes (and conceivably petabytes) of unstructured, uncategorized and unmanaged electronically stored information (ESI).  This tsunami of information will increasingly become an information liability for organizations that have never deleted a byte of information.  In 2012, more corporations will see the need to clean out their digital houses and will realize that such cleansing (where permitted) is a best practice moving forward.  This applies with equal force to the US government, which has recently mandated such an effort at President Obama’s behest.
  5. Information Governance Becomes a Viable Reality.  For several years there’s been an effort to combine the reactive (far right) side of the EDRM with the logically connected proactive (far left) side of the EDRM.  But now, a number of surveys have linked good information governance hygiene with better response times to eDiscovery requests and governmental inquires, as well as a corresponding lower chance of being sanctioned and the ability to turn over less responsive information.  In 2012, enterprises will realize that the litigation use case is just one way to leverage archival and eDiscovery tools, further accelerating adoption.
  6. Backup Tapes Will Be Increasingly Seen as a Liability.  Using backup tapes for disaster recovery/business continuity purposes remains a viable business strategy, although backing up to tape will become less prevalent as cloud backup increases.  However, if tapes are kept around longer than necessary (days versus months) then they become a ticking time bomb when a litigation or inquiry event crops up.
  7. International eDiscovery/eDisclosure Processes Will Continue to Mature. It’s easy to think of the US as dominating the eDiscovery landscape. While this is gospel for us here in the States, international markets are developing quickly and in many ways are ahead of the US, particularly with regulatory compliance-driven use cases, like the UK Bribery Act 2010.  This fact, coupled with the menagerie of international privacy laws, means we’ll be less Balkanized in our eDiscovery efforts moving forward since we do really need to be thinking and practicing globally.
  8. Email Becomes “So 2009” As Social Media Gains Traction. While email has been the eDiscovery darling for the past decade, it’s getting a little long in the tooth.  In the next year, new types of ESI (social media, structured data, loose files, cloud context, mobile device messages, etc.) will cause headaches for a number of enterprises that have been overly email-centric.  Already in 2011, organizations are finding that other sources of ESI like documents/files and structured data are rivaling email in importance for eDiscovery requests, and this trend shows no signs of abating, particularly for regulated industries. This heterogeneous mix of ESI will certainly result in challenges for many companies, with some unlucky ones getting sanctioned because they ignored these emerging data types.
  9. Cost Shifting Will Become More Prevalent – Impacting the “American Rule.” For ages, the American Rule held that producing parties had to pay for their production costs, with a few narrow exceptions.  Next year we’ll see even more courts award winning parties their eDiscovery costs under 28 U.S.C. §1920(4) and Rule 54(d)(1) FRCP. Courts are now beginning to consider the services of an eDiscovery vendor as “the 21st Century equivalent of making copies.”
  10. Risk Assessment Becomes a Critical Component of eDiscovery. Managing risk is a foundational underpinning for litigators generally, but its role in eDiscovery has been a bit obscure.  Now, with the tremendous statistical insights that are made possible by enabling software technologies, it will become increasingly important for counsel to manage risk by deciding what types of error/precision rates are possible.  This risk analysis is particularly critical for conducting any variety of technology assisted review process since precision, recall and f-measure statistics all require a delicate balance of risk and reward.

Accurately divining the future is difficult (some might say impossible), but in the electronic discovery arena many of these predictions can happen if enough practitioners decide they want them to happen.  So, the future is fortunately within reach.

Enterprise Strategy Group (ESG)’s Legal Trends Survey Reveals Alarming Inattention to eDiscovery Spending

Monday, December 5th, 2011

In their latest survey, entitled “E-Discovery Market Trends: A View from the Legal Department,” Enterprise Strategy Group (ESG) analysts Brian Babineau and Katey Wood analyze a number of interesting statistics and provide a range of insightful conclusions.  By surveying general counsel from large, mid-market (500-999 employees) and enterprise-class organizations in North America they were able to dive into a range of eDiscovery topics, including pain points, operational expenses and prioritizations on a go-forward basis.  Some are more intuitive than others, but in either case the results serve as good calibration metrics for those who endeavor to understand the corporate eDiscovery state of the nation.

“Most corporations are not tracking e-discovery spending…” In what may be the most notable finding of this ESG report, 60% of survey respondents claim that they did not track annual eDiscovery spending in 2010.  The authors correctly note that the eDiscovery process, “which can be highly unpredictable due to its project-by-project nature to begin with, has historically been outsourced to service providers charging at variable rates and often billed back to companies via their law firms.”  Despite the significant challenges of tracking eDiscovery spending, it’s nevertheless irresponsible for organizations to keep their heads in the sand regarding such a significant operational expense.

As the old saw goes, “you can’t manage what you can’t measure,” so it’s almost inconceivable to think that so many organizations aren’t tracking such a significant expense category.  For organizations who want to create a repeatable business process, as opposed to the fire-drill chaos that is typically associated with eDiscovery, it’s vitally important to accurately capture core eDiscovery metrics.  For starters, it’s useful to understand basic collection parameters, such as of the typical numbers of key custodians, average data volumes per custodian, data expansion rates, de-duplication statistics, etc.  Once these metrics are in place, it then becomes possible to manage the process and reduce costs.

Katey went on to expound in an exclusive quote for EDD 2.0:

“E-discovery can be managed as a strategic business process with an understanding of costs, performance and outcomes. When there’s no basis for reporting or comparison, it’s pin the tail on the donkey.  Corporate litigants won’t ever know they’re getting their money’s worth if they don’t even know what they’re spending.”

“E-Discovery accuracy/efficiency isn’t being measured, in large part.” Similar to the failure to measure eDiscovery costs, a full two thirds of GCs (67%) aren’t tracking the “efficiency and/or accuracy of e-discovery document review.” Until corporate counsel can link expectations of competency/efficiency with oversight and performance metrics, outside law firms will likely avoid having their feet held to the fire.  This passive stance makes transparency and process improvement difficult at best.  Additionally, this model of having expectations for efficiency, with low or no accountability, doesn’t bode well for the quick adoption of enabling technologies like predictive coding, since the driver has to inherently be the need/desire for increased efficiency (which axiomatically equals lower law firm review bills).

“Corporate information governance and litigation readiness (especially defensible deletion) are a priority, but not yet a reality.” From an internal prioritization perspective, more than two thirds (69%) of respondents identified their desire to expire/delete data more consistently, “thereby limiting unnecessary data retention for future litigation requests.”  Savvy enterprises correctly recognized the “multi-prong threat of unregulated data retention: the large amounts of irrelevant data ultimately produced for legal review, the greater difficulty of hanging onto potentially litigious documents past their required retention periods.”

This finding is very encouraging, and it ties into the upward momentum the industry is seeing regarding information governance generally – particularly linking the reactive (right) side of the EDRM with the logically connected and proactive (left) side of the EDRM.  As a good first step it’s critical to see organizations now associating good information governance hygiene with lower costs and better eDiscovery response times.  The ESG finding also triangulates with results from the recent Information Retention and eDiscovery Survey, which found that companies having good information governance hygiene were often able to respond much faster and more successfully to an eDiscovery/investigation requests, often suffering fewer negative consequences.

The only downside to the positive information governance trend, as reported by the survey, was that,

“while there are great benefits to defensible deletion, internal initiatives for implementing it too often are stymied by difficulty in obtaining cross functional consensus and authorization, particularly as it touches so many other critical processes like regulatory compliance and legal hold.”

“Legal hold processes are still very manual.” Another similar question revealed that many companies are attempting to get their information governance house in order, but are still in the very early stages.  When asked about their  current legal hold notification and tracking process, a whopping 69% of organizations said that they are using a “manual process performed by internal staff using e-mail and spreadsheets, etc.”  And, another 6% said they either had no formal process or tracking mechanism.

Given the risks attendant to flaws in the preservation process this area is ripe for improvement.  The good news is that 54% of survey respondents are intending to improve their legal hold process, with 25% planning improvement within the next 12 months.  This is a healthy acknowledgement that there is risk, and with a modicum of investment (time, personnel, procedures, and technology) the legal hold area can be brought up to current best practices.

The ESG survey is a welcome temperature gauge into the state of corporate legal departments.  It notes, in conclusion, “with the staggering growth, diversity and dispersion of data, the pain e-discovery is currently causing large and serial litigants are only a symptom of the larger problem of unwieldy and under-developed information management affecting all businesses.”  With data insights from the ESG survey, it’s becoming clear that foundational information governance elements (like deploying auditable legal hold procedures, tracking eDiscovery spending, updating data maps, etc.) are desperately needed by the many organizations that want to turn eDiscovery into a repeatable business process.  The good news is that many of these organization have improvements in mind for the next 12 months, and the challenge will be to make sure these proactive projects maintain the same level of organizational urgency that it often present for more reactive tasks.

Key eDiscovery Considerations for Selecting a Cloud Service Provider

Tuesday, October 25th, 2011

The data explosion that has burdened organizations across the globe for the past decade has become increasingly expensive to manage.  Many experts point to storage as the most obvious culprit for higher information governance costs.  There are, however, other factors driving those costs.  For example, demands for electronically stored information in legal and regulatory proceedings have significantly increased expenses surrounding data management.  Those demands have forced organizations to meet the high expectations that courts and regulatory bodies have for how they address their information or face the consequences.

Those consequences include sanctions and regulatory fines for groups that fail to account for how they store, manage and discover their information.  The $919 million verdict rendered in the E.I. du Pont de Nemours v. Kolon Industries case is paradigmatic of this trend.  That verdict was inextricably intertwined with the court’s instruction to the jury that executives and employees for defendant Kolon Industries deleted key evidence after the company’s preservation duty was triggered.

Going to Cloud Services for Data Archiving and eDiscovery

These rising data costs – and the risks they pose – are driving organizations to explore new technologies and methods for managing their data.  The latest alternative to traditional on-premise solutions involves leveraging cloud-based services.

The hype surrounding the cloud has generally focused on the opportunity for cheap and unlimited storage.  While cost effective data storage is important, that factor alone should not be determinative for selecting a cloud service provider.  Organizations must have the actual – not theoretical – ability to retrieve their data and do so in real time.  Otherwise, they may not be able to satisfy legal or regulatory requests, let alone the day-to-day demands of their operations.

In an analogous context, courts have traditionally compelled paper document productions even though the requested materials may be buried in a messy warehouse.  In one such case from this year, a U.S. district court in New York ordered a company to turn over decades-old records that were commingled with other materials in poorly labeled, shrink-wrapped boxes.  The court reasoned that disorganized record-keeping should not excuse an organization from producing relevant information.  See Brooks v. Macy’s (S.D.N.Y. May 6, 2011).

The rationale from the Brooks case is equally applicable to cloud-based services.  Cloud-based data must be intelligently organized so that companies can retrieve data in a timely fashion for business and legal purposes.  Otherwise, the savings achieved through cheap storage will be negated by the resulting legal quagmire.

Paring Back Superfluous and Duplicative Information

To facilitate the data retrieval process, the right cloud service provider should have the capacity to implement and observe applicable company retention policies.  An effective retention policy will generally help a company retain information that must be kept for business, legal or regulatory purposes – and nothing else.  The service provider should enable automated retention rules to ensure that information is kept only for a designated time period.  This will allow data to be expired once it reaches the end of that period.  And by expiring that data, the company will limit the amount of potentially relevant information available for follow-on litigation.

The pool of information can also be decreased through single instance storage.  This deduplication technology eliminates redundant data by preserving only a master copy of each document placed into the cloud.  This will reduce the amount of data that needs to be identified, collected and reviewed as part of the electronic discovery process.  For while unlimited data storage may seem ideal now, reviewing unlimited amounts of data will quickly become a logistical and costly nightmare.

Tools to Facilitate Discovery

A cloud service provider should ideally have eDiscovery functionality.  At a minimum, the service provider should be able to deploy legal holds to prevent users or automated policies from overwriting and destroying data.  Advanced search capabilities should also be included within the cloud-based service to reduce the amount of data that must be analyzed and then reviewed.  Moreover, the provider should support compatible load formats for export to third party review software.

Another key discovery issue is whether the cloud service provider can establish a clear audit trail for transmissions of company data.  Since information could be modified in transit by the routine operation of a service provider’s computer systems, an audit trail is necessary to prove that company documents and their metadata were not affected or otherwise compromised during transmission.  Without this assurance, a company may not be able to demonstrate the authenticity of its data before a tribunal or comply with key regulations.

A cloud server provider that can quickly retrieve and efficiently discover data has the potential to help organizations address their legal and regulatory demands in a cost effective manner.  Such a provider may be just the solution for organizations that are looking to properly address their runaway information governance costs.

Jumping the Gun? Three Approaches to Drafting New Federal Discovery Rules

Thursday, September 1st, 2011

In my last post I announced that discussions are taking place that could change the way preservation and sanctions issues are handled within the federal court system.  The next round of discussions about possible amendments to the Federal Rules of Civil Procedure (FRCP) is scheduled to take place on September 9th in Dallas, Texas as part of a “mini-conference” led by the Discovery Subcommittee – a committee appointed by the Advisory Committee on Civil Rules.  This post discusses three different rule amendment approaches that attendees have been asked to consider in order to help them prepare for the mini-conference.  A complete list of attendees, preparation materials, and questions the group will consider are included in the Advisory Committee’s June 29, 2011 memorandum to the participants.

The debate about whether or not rule amendments are even required is far from over.  A 452-page document located on the U.S. Courts’ website chronicles many of the meetings, notes, and submissions driving the current discussion.  Page 265 of the document contains a memorandum prepared by the Civil Rules Advisory Committee earlier this year, stating that:

“the Subcommittee has reached no conclusion on whether rule amendments would be a productive way of dealing with preservation/sanctions concerns, much less what amendment proposals would be useful.”

Despite concerns that amending the current rules now would amount to jumping the gun, there is an undeniable desire for more clarity around when the duty to preserve electronically stored information (ESI) is triggered, what must be preserved, and when the duty expires.  This momentum has resulted in the crafting of draft proposals that are likely to help frame the discussion on September 9th. The “proposals” are really draft approaches that have been broken down into three general categories described in the Civil Rules Advisory Committee’s memorandum, titled: “PRESERVATION/SANCTIONS ISSUES” (see page 263).  The Category 1 approach can best be described as providing a higher degree of specificity than the other approaches.  For example, the Category 1 approach provides a fairly detailed explanation of the duty to preserve evidence (Rule 26.1(a)) and details possible triggers (26.1(b)), the scope of the duty to preserve (26.1(c)), and sanctions (Rule 37).  Category 2 proposes a more general preservation rule, while Category 3 only addresses sanctions as a tool for influencing behavior.  The three categories are discussed in more detail below.

Category 1: Specific Rule

This draft includes many different exemplary lists, alternative approaches, and footnotes that highlight the fact that one of the key challenges with drafting a specific rule is trying to foresee all of the challenges that might lie in the road ahead.  For example, the draft rule provides a long list of events that could trigger the duty to preserve evidence, including everything from serving a pleading to taking “any other action” in anticipation of litigation.   The rule also provides a list of information types that are “presumptively excluded” from the preservation duty, such as deleted data on hard drives, temporary internet files, and physically damaged media.

The lists are helpful in that they provide guidance.  However, each list also includes a “catch-all” provision to address scenarios that might not be foreseeable.  The inclusion of catch-all provisions highlights the inherent challenge of providing more clarity and certainty without creating rules that are so inflexible that they are difficult to apply to unforeseen factual scenarios or technological developments.  Some might argue that trying to provide a laundry list of examples will make passage of new rules difficult because each item on the list will stir debate.  Others contend that the lists add little value because the catch-all provisions will still require litigators to pass the sniff test of “reasonableness.”

Despite the inherent challenges related to drafting rules with specificity, most practitioners would likely support the inclusion of lists or examples that provide at least some direction.  What is likely to be far more controversial with respect to Category 1 is the use of alternative language proposing fixed limits around custodians and litigation holds.  For example, one alternative would limit data preservation requirements to a fixed number of custodians and the duty to preserve evidence would similarly expire after a fixed number of years.  Bright line rules like these may be easier to understand, but they also tend to be controversial since they lack the flexibility necessary to fairly address every conceivable situation.

Category 2: General Rule

Like the Category 1 proposal, the Category 2 proposal uses lists and outlines several alternative approaches throughout the rule.  However, the Category 2 proposal fundamentally differs from Category 1 by outlining a more general approach.  For example, one of the alternatives essentially states that the duty to preserve evidence is triggered whenever a “reasonable person” would expect to be a party to an action.  Similarly, the ongoing duty to preserve information after the duty has been triggered would be evaluated based on what is described as a “reasonable period” under the circumstances.

The beauty of this more general approach lies in its simplicity and flexibility.  The idea is that evaluating conduct based on the “reasonableness” of a person’s actions is much easier than attempting to draft bright line legal guidelines that account for every possible factual scenario.  The flip side is that reasonable minds could differ and results could be inconsistent if there are no bright line rules.  What this means in the context of the federal rule discussion is that one judge might find a party’s conduct with respect to data preservation efforts reasonable, while another judge might issue sanctions based on the same set of facts.  In large part, it is this lack of certainty and guidance in the current rules that sparked the current debate in the first place.

Category 3: Sanctions-Based Rule

Unlike the first two categories, the Category 3 approach focuses only on sanctions and would act like more of a “back-end” rule.  In other words, the rule would not contain any specific directives about preservation, but it would provide direction in the areas of when and how sanctions might be applied.

Despite the draconian image a “sanctions” based rule might conjure up, the Category 3 rule may seem surprisingly lenient to some.  For example, absent extraordinary circumstances, the court would be prohibited from imposing any of the sanctions listed in Rule 37(b)(2) or from giving an adverse-inference instruction unless:

“the party’s failure to preserve discoverable information was willful or in bad faith and caused [substantial] prejudice in the litigation.”

The sanctions based approach would almost certainly have an impact on how parties handle upstream preservation related issues.  However, the key ingredients that will impact what kind of behavior this rule drives are the severity of the threatened sanction as well as the applicable standard.  For example, a party facing severe sanctions for conduct that is either negligent, willful or in bad faith is likely to take their preservation obligations seriously.  On the other hand, if the realm of possible sanctions is trivial, parties are less likely to take their preservation related obligations seriously.

Conclusion

The three rule approaches represent very early attempts at framing possible approaches to amending the FRCP.  If the Discovery Subcommittee chooses to recommend rule amendments following the September 9th mini-conference in Dallas, the proposed language is likely to be closer to final form and easier to assess than the current proposals.  I will continue to monitor the rule making discussion and provide commentary in future posts.  Stay tuned for my next post where former US Magistrate Judge Ron Hedges explains why he thinks the rule changes are unnecessary and why the current proposals might run afoul of the Rules Enabling Act.

Clearwell Doubles Down on Review

Monday, August 22nd, 2011


(Editor’s note: This special guest post was written by Chitran
g Shah, Clearwell Principal Product Manager. He is an RIT alum and avid hiker who works with our engineering team and lead customers to optimize the product for large-scale review. – Kurt)

As we’ve previously shared, our product strategy throughout 2009 and 2010 was to expand the product footprint across the EDRM as customers were demanding a single, end-to-end eDiscovery product. During this period we successfully expanded from our roots in processing, search and analysis to review and production (August 2009), identification and collection (September 2010) and legal hold workflow (March 2011). Over the last several months, our focus has been to go deep in each of these modules and provide features that deliver even greater return on investment to our customers.

Today, I am excited to announce significant new features and feature enhancements to the Clearwell Review and Production Module and say a few words about what motivated us to build these features and how they enable our customers to further streamline their legal review workflow.

There are several exciting features in this release, but I would to like to highlight three in particular:

1. Ability to seamlessly import production load files

Most matters require reviewing relevant documents alongside the documents received from third parties, opposing parties, and even previous litigations. With the new load file import feature, users can now streamline the process of importing load files with three simple steps.

In Step 1, a step-by-step wizard-like interface guides users though the selection of formatting information such as field delimiters and nested value delimiters, metadata information such as bates numbers, family relationships, tags, folders and any number of custom attributes, and content information such as images, extracted text and native files. When the load file has both extracted texts and native files, the wizard gives users an option to specify which content should be used for searching.

In Step 2, the system performs a deep validation of the load file and generates a report documenting any inconsistencies such as missing bates numbers or missing values for required fields found in the load file. As a result, customers have the ability to quickly find and fix any issues with the load file before the import begins.

In Step 3, the system imports the documents and builds analytics. Once this step completes, the imported documents, including all metadata and content, are available for viewing and searching.

All the analytics capabilities customers are familiar with, such as discussion threads and concept search, are also available for documents imported from load files. This allows users to quickly discover documents in the load file that are conceptually similar to natively processed documents, for example.

2. Support for large scale reviews and productions

As the volume of electronically stored information (ESI) continues to grow, our customers find themselves reviewing and exporting more and more documents, and they need a solution that can cope with the massive growth in data. At the same time, they don’t want to spend large sums of money building a server farm in anticipation of the growth. They want the flexibility to add capacity when needed and remove it when not needed.

Clearwell’s scale-out architecture enables administrators to easily add appliances and allocate them to a particular matter and to a specific task using a point-and-click interface.

For example, if an administrator needs to increase the number of reviewers from 200 to 400 in order to meet a tight deadline, he or she can easily add 2 appliances to the cluster and assign them for review. Once the review completes, the administrator can now easily re-assign these appliances for production, allowing users to easily meet deadlines while reducing their overall hardware costs.

This flexibility allows our customers to maximize the use of their hardware resources while providing infinite review, export and production scalability.

3. Streamlined management of exports and productions

Clearwell provides powerful export options, and while our customers use them extensively for creating a variety of different production formats, they typically standardize on a few. Clearwell’s new case export and production templates provide a quick and easy way for case administrators to define the export format once and use it across multiple cases. When exporting documents, users can simply select a template from the list of visible templates in that case. This capability significantly reduces the overhead associated with managing export formats and allows our customers to produce documents in a consistent format across multiple matters.

Additionally, new production pre-mediation reports automatically identify problem documents and group them by issue type for quick resolution. This enables users to preemptively identify and resolve document production issues without delaying entire productions.

Says Wendy Butler Curtis, chair of Orrick, Herrington & Sutcliffe’s eDiscovery Working Group, “Legal review is one of the most challenging phases of the eDiscovery process. As electronic data volumes continue to grow, it is increasingly important to leverage technologies that can streamline and improve legal review, ensure defensibility and reduce costs. Solutions like the Clearwell eDiscovery Platform enable legal teams to create an iterative eDiscovery workflow that allows for more efficient and effective large-scale review.”

We will be showcasing the new features at ILTA (Booth 816) this week in Nashville, so come see us and let us know what you think.

(Chitrang Shah is a Principal Product Manager at Clearwell Systems, now a part of Symantec, and the lead Product Manager for Clearwell’s Processing & Analysis and Review & Production Modules)

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

Go With the (Work)flow in Electronic Discovery

Thursday, June 10th, 2010

Recently, I attended a conference in Washington DC with a large number of government agencies, including (I must confess) many Clearwell customers like the Department of Health and Human Services, the Department of Homeland Security, and the Veterans Administration. It will probably come as no surprise that, during our conversations, it became abundantly clear that they had substantial electronic discovery technology needs. Many were still reviewing PST files manually in Outlook; others were TIFFing millions of pages of documents prior to directly loading into a traditional review application for eyes-on review. That’s right, nary a trace of early case assessment, transparent search, or culling to be found.

Sadly, no news there. What was fascinating for us was the reaction to the latest release of the Clearwell E-Discovery Platform, Version 5.5. Version 5.5 contains significant new functionality, including dramatically increased performance and scalability along with a number of substantial processing, analysis, review, and production enhancements. But, in addition to these features, we have rolled out a set of e-discovery best practices templates designed to make it vastly easier for organizations to implement a formal e-discovery methodology that builds on the integrated nature of our platform. And it was the prospect of such a methodology, even more than the technology, that people were buzzing about at the summit.

Why? With all of the activity going on in the e-discovery space around product and technology innovation, there was some strong feedback that process and methodology may have gotten lost in the shuffle. And, if you think about it, it’s process and methodology that are likely to be most carefully assessed when the courts are considering the reasonableness (or lack thereof) of e-discovery for a case.

The importance of putting process and methodology front and center (along with a commitment to making the necessary organizational changes to make it happen) is not exactly a new concept. Ralph Losey has been talking about it for years over on his groundbreaking and irreverent e-discovery team blog, and it’s a frequent topic of keynote speakers on the e-discovery lecture circuit. However, like eating your vegetables or exercising, putting in place the right e-discovery process in an organization is something that people realize the benefit of, but still ignore.

This cannot continue, as the stakes are escalating. Take the recent case of Mt. Hawley Ins. Co. v. Felman Prod., Inc. Dean will dive into this case in much greater detail in an upcoming post, but it is very relevant to the methodology versus technology discussion in that it highlights how a methodology problem can cause a fateful technology problem to be overlooked. In this case, a lack of sufficient quality control processes caused the plaintiff to inadvertently produce a number of privileged emails. The court found the inadvertent production was not “solely attributable” to a problem with a Concordance index, and that the plaintiff “failed to perform critical quality control sampling” to determine whether the production was appropriate. Privilege was waived.

What’s the solution? We believe that we’re on to something with Clearwell 5.5, in that we can, uniquely among e-discovery products, marry together methodology and technology in a single platform that allows for the entire e-discovery process to be documented and defended, end-to-end. We have particularly focused on the most critical part of the process which seems to come up over and over again in sanction and privilege waiver decisions, which is the way that an organization moves from an initial pool of documents to a set of defensibly-culled, potentially responsive documents, on through to tagging and production. Our unique workflow capabilities allow the entire process to be documented and instantly recalled with the click of a mouse, letting you see each decision that was made during the course of the case in a step-by-step fashion, and then to structure additional quality control audits on top of those decisions to ensure that every “i” is dotted and every “t” crossed.

It’s a good thing for everyone involved in litigation that e-discovery technology is maturing rapidly to the point where it can start to help solve these sorts of process problems rather than being the cause of them (as was the unfortunately case in Mt. Hawley). This is a major focus for us at Clearwell and you’ll see a lot more exciting news from us on this front over the next few months, so stay tuned!