United States District Court Judge for the Northern District of Indiana, Ronald J. Miller, recently addressed what has arguably become the hottest predictive coding issue since Judge Andrew J. Peck’s February 2012 order in Da Silva Moore v. Publicis Groupe. The issue is whether or not parties who use predictive coding technology to assist with document productions should disclose the non-responsive documents used to train their system to the other side.
Judge Peck Opens the Predictive Coding Door
In Da Silva Moore, Judge Peck became the first judge to state that the use of predictive coding technology is “acceptable in appropriate cases.” Since the decision, some litigation attorneys have criticized the predictive coding protocol the parties established. Central to that criticism is the inclusion of a provision requiring the voluntary disclosure of non-privileged documents used to train the predictive coding system.
Fearing a judicial trend, many attorneys have argued that the Federal Rules of Civil Procedure (Rules) simply do not require the disclosure of non-responsive documents under any circumstances. Others argue that a little cooperation and transparency between adversaries isn’t a bad thing when one party saves money and time and the other receives a more thorough production. Not surprisingly, both sides have eagerly awaited judicial guidance.
Judge Miller Tackles the Hot Button Issue
In In Re: Biomet, Judge Miller provided that long-awaited guidance by holding that Rule 26 does not require a party to disclose seed set documents used to train a predictive coding system. The order came on the heels of an earlier April 2013 order denying plaintiffs’ motion to compel Biomet to re-do earlier document productions (unless plaintiffs paid). The plaintiffs argued that Biomet’s decision to use key word search terms and de-duplication techniques to cull 19.5 million documents down to 2.5 million before using predictive coding technology “tainted” the production process. More specifically, plaintiffs contended that using keywords to filter out documents likely excluded responsive documents that should have been produced. Judge Miller found plaintiffs’ arguments unconvincing, largely due to the fact that Biomet had already spent approximately $1.07 million on eDiscovery.
Four months later plaintiffs filed another motion requesting more transparency into Biomet’s predictive coding process. Plaintiffs moved to compel Biomet to disclose and identify the initial seed set documents used to train the predictive coding system to distinguish between a responsive and non-responsive document. Plaintiffs reasoned that knowing which documents Biomet coded as responsive and non-responsive was necessary to measure the accuracy of Biomet’s production. In the order denying plaintiffs’ request, Judge Miller stated:
“As I understand it, a predictive coding algorithm offers up a document, and the user tells the algorithm to find more like that document or that the user doesn’t want more documents like what was offered up. The Steering Committee wants the whole seed set Biomet used for the algorithm’s initial training. That request reaches well beyond the scope of any permissible discovery by seeking irrelevant or privileged documents used to tell the algorithm what not to find. That the Steering Committee has no right to discover irrelevant or privileged documents seems self-evident.”
Judge Miller continued by acknowledging plaintiffs’ argument that Biomet was not proceeding in the cooperative spirit endorsed by the Sedona Conference Cooperation Proclamation and the 7th Circuit Pilot Program. However, he stated that:
“[N]either the Sedona Conference nor the Seventh Circuit project expands a federal district court’s powers, so they can’t provide me with authority to compel discovery of information not made discoverable by the Federal Rules.”
In particular, Judge Miller pointed to the language contained in FRCP 26(b)(1) as a basis for his decision. He concluded that because the plaintiffs knew of the “existence and location” of each discoverable document Biomet used in the seed set, Biomet had complied with their production obligation. Surprisingly, Judge Miller’s analysis did not specifically address what some may argue is the key language in FRCP 26(b)(1) which states:
“For good cause, the court may order discovery of any matter relevant to the subject matter involved in the action. Relevant information need not be admissible at the trial if the discovery appears reasonably calculated to lead to the discovery of admissible evidence.”
Judge Miller went on to criticize Biomet’s “unexplained lack of cooperation” and urged Biomet to rethink its refusal to at least reveal the responsive documents used in the seed set. His comments indicated that plaintiffs’ position would be stronger if they had only requested the identification of the non-privileged and non-responsive seed set. However, he ultimately refused to compel the identity of any of the seed set documents because he lacked “any discretion in this dispute.”
Is the Issue Resolved?
Even though Judge Miller explained that he lacked “any discretion in this dispute,” some future litigants are likely to argue that Rule 26 provides judges with the discretion to order the disclosure of documents that are both non-responsive and non-privileged where appropriate. For example, proponents of disclosure are likely to argue that coding decisions applied to training documents could have a significant impact on the discovery of admissible evidence. If training documents are coded accurately, the likelihood of discovering admissible evidence increases if that evidence exists. On the other hand, adversaries are likely to respond sharply that sharing non-responsive documents has not been required in the past and should not be required in the future. In fact, following Da Silva Moore, some have argued that even keywords are work-product protected and should not be disclosed.
In Re: Biomet appears to be the first case addressing whether or not parties are obligated to share non-responsive documents used to train a predictive coding system — but likely won’t be the last. First, the decision is not binding. Second, Judge Miller did not thoroughly address key language contained within 26(b)(1) which invites further analysis. Lastly, the legal industry is struggling to define predictive coding best practices and to understand the range of different predictive coding technology solutions. Given the current confusion, demands for more predictive coding transparency are likely to continue as the market evolves. Don’t expect this hot button issue to cool off any time soon.
*Blog post co-authored by Matt Nelson and Adam Kuhn