<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>e-discovery 2.0 &#187; Victor Stanley</title>
	<atom:link href="http://www.clearwellsystems.com/e-discovery-blog/category/victor-stanley/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.clearwellsystems.com/e-discovery-blog</link>
	<description>thoughts about the evolution of e-discovery</description>
	<lastBuildDate>Fri, 10 Feb 2012 18:35:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Patents and Innovation in Electronic Discovery</title>
		<link>http://www.clearwellsystems.com/e-discovery-blog/2011/06/13/patents-and-innovation-in-electronic-discovery/</link>
		<comments>http://www.clearwellsystems.com/e-discovery-blog/2011/06/13/patents-and-innovation-in-electronic-discovery/#comments</comments>
		<pubDate>Mon, 13 Jun 2011 17:37:43 +0000</pubDate>
		<dc:creator>Venkat Rangan</dc:creator>
				<category><![CDATA[Clearwell]]></category>
		<category><![CDATA[e-discovery]]></category>
		<category><![CDATA[e-discovery software]]></category>
		<category><![CDATA[e-discovery workflow]]></category>
		<category><![CDATA[early case assessment]]></category>
		<category><![CDATA[ECA]]></category>
		<category><![CDATA[EDD]]></category>
		<category><![CDATA[ediscovery]]></category>
		<category><![CDATA[EDRM]]></category>
		<category><![CDATA[electronic data discovery]]></category>
		<category><![CDATA[electronic discovery]]></category>
		<category><![CDATA[ESI]]></category>
		<category><![CDATA[legal discovery]]></category>
		<category><![CDATA[litigation discovery]]></category>
		<category><![CDATA[litigation software]]></category>
		<category><![CDATA[litigation support software]]></category>
		<category><![CDATA[patent]]></category>
		<category><![CDATA[Recommind]]></category>
		<category><![CDATA[Victor Stanley]]></category>
		<category><![CDATA[e-mail]]></category>
		<category><![CDATA[ediscovery software]]></category>
		<category><![CDATA[information retrieval]]></category>
		<category><![CDATA[predictive coding]]></category>
		<category><![CDATA[Sedona]]></category>
		<category><![CDATA[Sedona Conference]]></category>
		<category><![CDATA[SIGIR]]></category>
		<category><![CDATA[text classification]]></category>
		<category><![CDATA[TREC]]></category>
		<category><![CDATA[workflow]]></category>

		<guid isPermaLink="false">http://www.clearwellsystems.com/e-discovery-blog/?p=1648</guid>
		<description><![CDATA[In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin-right: 10px;" title="Recommind Ball and Chain" src="http://mylittlepail.com/wp-content/uploads/2008/12/ball_and_chain.jpg" alt="" width="190" height="181" />In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.</p>
<p>In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as <a href="http://trec.nist.gov/" target="_blank">TREC</a>, <a href="http://www.sigir.org/" target="_blank">SIGIR</a> and other information retrieval <a href="http://academic.research.microsoft.com/RankList?entitytype=3&amp;topDomainID=2&amp;subDomainID=8&amp;last=0&amp;start=1&amp;end=100" target="_blank">conferences</a> establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.</p>
<p>The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., <em>titled Relevance Feedback in Information Retrieval</em>. It falls under a category of methods broadly known as machine learning.</p>
<p>Any <a href="http://en.wikipedia.org/wiki/Supervised_learning" target="_blank">supervised machine learning system</a> involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article <a href="http://en.wikipedia.org/wiki/Learning_to_rank" target="_blank">Learning to rank</a> describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.</p>
<p>In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the <em>Proceedings of Sixteenth International Conference on Machine Learning</em>, as well as his book <em>Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms</em>, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.</p>
<p>Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, <em>Indexing by Latent Semantic Analysis</em>, in <em>Journal of the ASIS</em>, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.</p>
<p>Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the <a href="http://www.thesedonaconference.org/dltForm?did=Achieving_Quality.pdf" target="_blank">Sedona Commentary on Achieving Quality</a> and the <a href="http://www.edrm.net/resources/guides/edrm-search-guide" target="_blank">EDRM Search Guide</a> recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [<em>Victor Stanley, Inc. v. Creative Pipe, Inc. </em>, 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.</p>
<p>In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.</p>
<p>But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery <a href="http://web.archive.org/web/20050214030849/http:/www.h5technologies.com/whatwedo/example3.html" target="_blank">over two years before Recommind announced their first version of Axcelerate</a>.</p>
<p>Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clearwellsystems.com/e-discovery-blog/2011/06/13/patents-and-innovation-in-electronic-discovery/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automated Review in Electronic Discovery Re-Visited</title>
		<link>http://www.clearwellsystems.com/e-discovery-blog/2010/06/28/automated-review-in-electronic-discovery-re-visited/</link>
		<comments>http://www.clearwellsystems.com/e-discovery-blog/2010/06/28/automated-review-in-electronic-discovery-re-visited/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 20:48:01 +0000</pubDate>
		<dc:creator>Dean Gonsowski</dc:creator>
				<category><![CDATA[e-discovery]]></category>
		<category><![CDATA[e-discovery software]]></category>
		<category><![CDATA[early case assessment]]></category>
		<category><![CDATA[EDD]]></category>
		<category><![CDATA[ediscovery]]></category>
		<category><![CDATA[electronic data discovery]]></category>
		<category><![CDATA[electronic discovery]]></category>
		<category><![CDATA[Electronically Stored Information]]></category>
		<category><![CDATA[ESI]]></category>
		<category><![CDATA[legal discovery]]></category>
		<category><![CDATA[litigation discovery]]></category>
		<category><![CDATA[litigation software]]></category>
		<category><![CDATA[litigation support software]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[Victor Stanley]]></category>
		<category><![CDATA[workflow]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[defensibility]]></category>
		<category><![CDATA[defensible e-discovery]]></category>
		<category><![CDATA[discovery]]></category>
		<category><![CDATA[e-mail]]></category>
		<category><![CDATA[ECA]]></category>
		<category><![CDATA[ediscovery software]]></category>
		<category><![CDATA[EDRM]]></category>
		<category><![CDATA[litigation workflow]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.clearwellsystems.com/e-discovery-blog/?p=934</guid>
		<description><![CDATA[Almost two years ago I wrote one of my first blog posts entitled “Review-less E-Discovery Review.”  Despite the tongue twister of a title, the post posited that “there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.” I’d like to take a look and [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="Automated Review in E-Discovery" src="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2010/06/t2.jpg" alt="e-discovery" width="200" height="293" /> Almost two years ago I wrote one of my first blog posts entitled “<a href="http://www.clearwellsystems.com/e-discovery-blog/2008/07/21/review-less-e-discovery-review/" target="_blank">Review-less E-Discovery Review</a>.”  Despite the tongue twister of a title, the post posited that “there is a very real possibility that we’re on the cusp of computers taking over a significant e-discovery task for attorneys.” I’d like to take a look and see how much (if at all) my prognostications have materialized.</p>
<p>A cynic might think that this is the moment where E-Discovery 2.0 <a href="http://en.wikipedia.org/wiki/Jump_the_shark" target="_blank">jumps the shark</a>.  But no, this isn’t one of those sitcom episodes where they flashback to previous shows as an easy way to recycle content.  Instead, it seems useful to see how the legal market has evolved from a litigation workflow perspective, particularly with some vendors touting the benefits of review-less technologies like predictive coding.</p>
<p>In the original blog, I noted that there was a “scenario where a non-manual review methodology may make sense” (while importantly noting that “this approach is not without risk”).  Since my last post there has been the successful adoption of <a href="http://federalevidence.com/resources502" target="_blank">Evidence Rule 502</a>,which makes this methodology (at least conceptually) safer.</p>
<p>But again (imagine <a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/FlashbackEffects" target="_blank">dreamy flashback mode</a>), here were the guidelines I previously proffered:</p>
<ol>
<li><strong>Large data set</strong>.       This may sound a bit obvious, but a non-manual approach is best suited for      large, unwieldy data sets.  The corpus doesn’t need to be in the      terabytes, but the data set should be evaluated in term of discovery      processing costs and attorney review estimates.</li>
<li><strong>Short Production      Timelines</strong>.  Once the above calculations are conducted, the      next step is to determine if a human based review could even conceivably      be conducted in the given time frame.  In many instances, an eyes-on      review process just won’t be feasible since there won’t be enough bodies      to throw at the problem.</li>
<li><strong>Next Gen “PAR”      Tools</strong>.  In order to pull this “review-less” review process      off, both safely and quickly, the responding party needs to have access to      fast, robust processing, analysis and review (“PAR”)      tools.  Certainly, it’s possible to have this scenario work with an      e-discovery service provider, if they have the capability.</li>
<li><strong>Relatively Small      Amount in Controversy</strong>.  For the time being, this approach      should not be considered for any “bet the company” litigation, nor      anything with significant downside risk (governmental inquiries, punitive      damages, class actions, 2nd requests, etc.).  Yet, for many standard      commercial lawsuits, corporate investigations, HR claims, etc. this      review-less approach may be worth considering.</li>
<li><strong>Ability to Use a      Clawback Provision</strong>.  Entering into a clawback provision with      the opposition is mandatory in this methodology since the chances of an      inadvertent production are statistically ever-present.  Yet, until      Evidence Rule 502 is resolved, there will always be a risk that the      clawback won’t be enforceable against 3rd parties.</li>
<li><strong>Non-governmental      Production</strong>.  Most information in governmental productions      becomes part of the public record, meaning that a clawback isn’t going to      be feasible.  Here, trade secret information, personally identifiably      data and the like would be disastrous if pushed out into the public      domain.</li>
</ol>
<p>The goal of this post is to see if this dog is any more ready to hunt than it was two years ago.  The short answer (right now) appears to be: No.</p>
<p>We all know that litigators are both risk adverse and generally <a href="http://blogs.law.harvard.edu/amy/2008/09/05/aba-survey-says-lawyers-still-slow-to-adopt-technology/" target="_blank">slow to adopt new technology approaches</a>.  This is particularly true when there’s a perception that they won’t have insight into the technological <a href="http://en.wikipedia.org/wiki/Black_box" target="_blank">black box</a> behind automated coding/tagging decisions.  Litigators are understandably sensitive about the ability to prove up the reasonability of their search and review processes.  This “reasonableness” requirement lines up both with the <a href="http://commonscold.typepad.com/eddupdate/2008/06/victor-stanley.html" target="_blank"><em>Victor Stanley</em></a> requirements and FRE 50(b), which eliminates the chance of a waiver only “if the holder of the privilege or work product protection took reasonable precautions to prevent disclosure.”</p>
<p>Given this ongoing hesitancy, the question remains shouldn’t we be seeing more movement in automated review than the glacial progress that’s been achieved to date, particularly with the known shortcomings of the eyes-on review process?  Most are familiar with the 1985 <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.1144" target="_blank">STAIRS study by Blair and Marion</a> where the percentage of relevant documents lawyers thought they had found using Boolean Keyword searches was 75% &#8211; when the percentage they actually found was 20%.</p>
<p>But, despite the known deficiencies of eyes-on review it follows into the “<a href="http://www.usingenglish.com/reference/idioms/better+the+devil+you+know.html" target="_blank">go with the devil you know</a>” mindset that often makes sense when dealing with judges and juries who aren’t likely to grok newer-fangled approaches.</p>
<p>In addition to these high-level, almost dogmatic challenges, there is one other tactical element I’d add to my previous list (of 6 factors).<strong> </strong></p>
<p><strong>7. All      documents processed up-front (no rolling collection). </strong>I’ve heard some in the trenches e-discovery      experts claim that they’ve never had a case that didn’t involve at least      some level of incremental data collections.  Whether this is an overstatement is      immaterial.  The fact is that a      large number of e-discovery projects involve ESI that is collected (and      then processed) in dribs and drabs.       This if often a good thing, largely attributable to the incremental      (start slowly) nature of a well thought out e-discovery project where a      smaller number of initial custodians are processed, then ECA is conducted      and only then is the additional ESI added to the corpus.  This common methodology causes some      significant heartburn for a review-less methodology since the ever      changing nature of the corpus makes it difficult/impossible for a sample      to be truly extensible to what will eventually be the entire data      set.  For this reason, the      review-less approach should be limited to where the entire corpus is      collected and processed at once.</p>
<p><strong> </strong></p>
<p>In sum, the seven foregoing factors appear to still be largely valid and create an environment where an automated, review-less methodology will only make sense in a relatively rare set of circumstances.  This may change in the future, but given the risk adverse DNA of most litigators I can’t imagine this tipping point happening any time soon.</p>
<p>Learn More On<a href=" http://www.clearwellsystems.com/e-discovery-customers/litigation-support-software.php"> Litigation Software</a> &amp; <a href="http://www.clearwellsystems.com/electronic-discovery-solutions/electronic-discovery-litigation.php">Electronic Discovery Litigation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.clearwellsystems.com/e-discovery-blog/2010/06/28/automated-review-in-electronic-discovery-re-visited/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>As the Electronic Discovery World Zurns</title>
		<link>http://www.clearwellsystems.com/e-discovery-blog/2009/07/29/as-the-electronic-discovery-world-zurns/</link>
		<comments>http://www.clearwellsystems.com/e-discovery-blog/2009/07/29/as-the-electronic-discovery-world-zurns/#comments</comments>
		<pubDate>Wed, 29 Jul 2009 18:02:07 +0000</pubDate>
		<dc:creator>Dean Gonsowski</dc:creator>
				<category><![CDATA[cooperation proclamation]]></category>
		<category><![CDATA[e-discovery]]></category>
		<category><![CDATA[e-discovery software]]></category>
		<category><![CDATA[EDD]]></category>
		<category><![CDATA[ediscovery]]></category>
		<category><![CDATA[electronic data discovery]]></category>
		<category><![CDATA[electronic discovery]]></category>
		<category><![CDATA[ESI]]></category>
		<category><![CDATA[Judge Grimm]]></category>
		<category><![CDATA[Judge Montgomery]]></category>
		<category><![CDATA[legal discovery]]></category>
		<category><![CDATA[processing]]></category>
		<category><![CDATA[sampling]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Sedona Conference]]></category>
		<category><![CDATA[Victor Stanley]]></category>
		<category><![CDATA[Zurn]]></category>
		<category><![CDATA[certification]]></category>
		<category><![CDATA[duplication]]></category>
		<category><![CDATA[precision]]></category>
		<category><![CDATA[recall]]></category>
		<category><![CDATA[Sedona]]></category>

		<guid isPermaLink="false">http://www.clearwellsystems.com/e-discovery-blog/?p=620</guid>
		<description><![CDATA[Judge Grimm&#8217;s Victor Stanley case was lauded by many as one of the most significant electronic discovery cases of 2008, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators. &#8220;[F]or lawyers and judges to dare opine that a certain search term [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" title="As the World Zurns" src="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2009/07/zurn.jpg" alt="" width="200" height="92" /><a href="http://www.shapirosher.com/PaulW.Grimm.htm" target="_blank">Judge Grimm&#8217;s</a> <em><a href="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/06/victorstanleymomay29_08final.pdf" target="_blank">Victor Stanley</a></em> case was lauded by many as one of the most significant <a href="http://www.clearwellsystems.com/" target="_blank">electronic discovery</a> <a href="http://www.clearwellsystems.com/e-discovery-blog/2008/12/12/top-5-cases-that-shaped-electronic-discovery-in-2008/" target="_blank">cases of 2008</a>, mainly for its bold proclamation that e-discovery search is a much more complex and technical discipline than has been typically understood by litigators.</p>
<p>&#8220;[F]or lawyers and judges to dare opine that a certain search term or terms would be more likely to produce information than the terms that were used is truly to go where angels fear to tread.&#8221;</p>
<p>Despite, legions of articles and <a href="http://www.clearwellsystems.com/e-discovery-blog/2008/06/16/%E2%80%9Cangels-tread%E2%80%9D-an-e-discovery-classic/" target="_blank">blogs</a> on the topic, at least certain portions of the bench haven&#8217;t taken heed.  In the case <em><a href="http://www.mnd.uscourts.gov/MDL-Zurn/Orders_Minutes/2009/090605-ZurnPexMotionToCompelESI.pdf" target="_blank">In re: Zurn Pex Plumbing Products Liability Litigation</a>, </em> 2009 U.S. Dist. LEXIS 47636 (June, 5, 2009) (hereinafter &#8220;<em>Zurn</em>&#8220;), <a href="http://www.fjc.gov/servlet/tGetInfo?jid=1668" target="_blank">U.S. District Judge Ann Montgomery</a> receives points for understanding some basic e-discovery tenants around <a href="http://www.jerrybui.com/edd/2008/04/recall-and-precision.html" target="_blank">recall and precision</a>, but then mysteriously goes where &#8220;angels fear to tread&#8221; by suggesting her own search terms.</p>
<p>Examining the case facts in more detail,&#8230;  <em>Zurn</em> is a class action products liability case where discovery was bifurcated (as is often the case &#8211; see <em><a href="http://www.ediscoverylaw.com/uploads/file/Westlaw_Document_Spieker.doc" target="_blank">Spieker v. Quest Cherokee</a></em>) to first cover the class &#8220;certification&#8221; component.  Initially, the Magistrate partially closed the door on broader ESI discovery, stating that &#8220;while ESI may prove to be relevant to the first stage of discovery, we cannot meaningfully make that prediction now, and require the parties to engage in what could be vastly more expensive, and yet utterly futile, discovery.&#8221;  However, the Magistrate didn&#8217;t shut the door entirely, suggesting that &#8220;should the parties uncover voids in the information disclosed in hard copy form, they are . . . at liberty to press for further discovery including electronically stored information.&#8221;</p>
<p>Despite complying with <a href="http://www.thesedonaconference.org/content/tsc_cooperation_proclamation" target="_blank">Sedona&#8217;s Cooperation Proclamation</a> (&#8220;The parties have worked amicably throughout the discovery process&#8221;) opposing counsel still got to loggerheads when plaintiff found &#8220;voids&#8221; in the initial paper productions via third party discovery.  The plaintiff brought a motion to compel ESI discovery and the defendant objected, stated two primary arguments: (1) the Magistrate earlier ruled out ESI discovery and (2) if they had to perform ESI discovery it would be unduly burdensome/expensive.</p>
<p>Judge Montgomery summary rejected the first argument, but was concerned about the burden surrounding the proposed ESI discovery.  Here, the calculations get a bit confusing, but plaintiff&#8217;s request would have resulted in 361 gigabytes of ESI from employee email sources, as well as shared &#8220;J&#8221; and &#8220;K&#8221; drives.  The defendant multiplied the gigabyte number by 75,000 pages per gigabyte, which would have required &#8220;approximately seventeen weeks and cost $ 1,150,000, exclusive of vendor collection and processing costs, to review and process the data.&#8221;  Assuming a rather modest $1,000 per gigabyte for processing and hosting costs, defendants could&#8217;ve added another $400,000 for the project.</p>
<p>Ultimately, the court was not persuaded by the supporting affidavits, nor the attorney&#8217;s representations about the resulting burden:</p>
<p>&#8220;It is unclear whether Zurn&#8217;s cost and time numbers are based on a review of 27 million pages of documents, the 3.6 million pages of documents limited to the J Drive and custodians&#8217; emails, or a smaller sample of document pages likely to be flagged as a result of a search for certain relevant terms pro-posed by Plaintiffs. The affidavit of Ms. Freestone, an attorney and not an expert on document search and retrieval, is not compelling evidence that the search will be as burdensome as Zurn avers.&#8221;</p>
<p>The 361 gigabytes apparently resulted from &#8220;hits&#8221; corresponding to plaintiff&#8217;s 26 search terms.  The court correctly identified that those terms had precision issues (&#8220;many of Plaintiffs&#8217; proposed search terms will likely produce a large number of ‘hits&#8217; that have limited relevance in the case.&#8221;)</p>
<p>Unfortunately, in an effort to increase the search precision, the Judge did not take heed of Judge Grimm&#8217;s warning and surprisingly took matters into her own hands: &#8220;the Court will limit the search to the following fourteen terms based on the likelihood that they will  produce relevant documents without including a vast number of documents that are likely irrelevant to the litigation.&#8221;  Here is the Judge&#8217;s list of keywords:</p>
<p>(1) AADFW,<br />
(2) Corrosion,<br />
(3) Corrosive,<br />
(4) Corrosive Water,<br />
(5) Crack,<br />
(6) De-zinc,<br />
(7) Dezincification,<br />
(8) DZR,<br />
(9) Fail,<br />
(10) IMR,<br />
(11) Leak,<br />
(12) MES,<br />
(13) SCC,<br />
(14) Stress corrosion cracking</p>
<p>Without looking at the underlying data, it&#8217;s clear from the outset that Judge Montgomery didn&#8217;t craft a good search strategy (as Judge Grimm might have predicted).  For example, terms 2, 3, 4 and 14 could&#8217;ve been captured by a single <a href="http://en.wikipedia.org/wiki/Stemming" target="_blank">stemmed</a> search using the term &#8220;corros*.&#8221; Without such a stemmed search approach, the terms would probably have been run singly in the proposed protocol, meaning that each one would&#8217;ve had tremendous duplication, thereby resulting in wasted attorney review time and processing costs.</p>
<p>Judge Montgomery did recognize the potential error of her ways and gave the parties an out:</p>
<p>&#8220;The parties may decide on a different set of fourteen terms if they choose to do so. Additionally, if the search, as ordered by the Court, proves to be overly burdensome or costly, Zurn may renew its objection by presenting the Court with specific information including evidence from computer experts on applying the search terms, the number of documents identified, and the cost and time burdens of vetting documents.&#8221;</p>
<p>This &#8220;specific evidence&#8221; language seems to track notions from Sedona&#8217;s <a href="http://www.thesedonaconference.org/dltForm?did=Best_Practices_Retrieval_Methods___revised_cover_and_preface.pdf" target="_blank">search best practices protocol</a>, which prescribes sampling and iterative search term refinement.  What is surprising is that knowing this she would nevertheless <em>blindly</em> proffer the 14 term search strategy.  Instead, she should&#8217;ve quoted <em>Victor Stanley</em> and required the parties to come up with a data driven approach that met requisite precision and recall metrics.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clearwellsystems.com/e-discovery-blog/2009/07/29/as-the-electronic-discovery-world-zurns/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top 5 Cases That Shaped Electronic Discovery in 2008</title>
		<link>http://www.clearwellsystems.com/e-discovery-blog/2008/12/12/top-5-cases-that-shaped-electronic-discovery-in-2008/</link>
		<comments>http://www.clearwellsystems.com/e-discovery-blog/2008/12/12/top-5-cases-that-shaped-electronic-discovery-in-2008/#comments</comments>
		<pubDate>Fri, 12 Dec 2008 21:40:45 +0000</pubDate>
		<dc:creator>Dean Gonsowski</dc:creator>
				<category><![CDATA[cooperation proclamation]]></category>
		<category><![CDATA[e-discovery]]></category>
		<category><![CDATA[e-discovery software]]></category>
		<category><![CDATA[EDD]]></category>
		<category><![CDATA[ediscovery]]></category>
		<category><![CDATA[electronic data discovery]]></category>
		<category><![CDATA[electronic discovery]]></category>
		<category><![CDATA[Federal Rules of Evidence]]></category>
		<category><![CDATA[FRCP]]></category>
		<category><![CDATA[Judge Grimm]]></category>
		<category><![CDATA[keyword search]]></category>
		<category><![CDATA[legal discovery]]></category>
		<category><![CDATA[Rhoads]]></category>
		<category><![CDATA[Sedona Conference]]></category>
		<category><![CDATA[Victor Stanley]]></category>
		<category><![CDATA[Flagg v. City of Detriot]]></category>
		<category><![CDATA[In re Seroquel]]></category>
		<category><![CDATA[Inc. v. Bldg. Materials]]></category>
		<category><![CDATA[Mancia v. Mayflower]]></category>
		<category><![CDATA[Sedona]]></category>
		<category><![CDATA[top 5]]></category>

		<guid isPermaLink="false">http://www.clearwellsystems.com/e-discovery-blog/?p=265</guid>
		<description><![CDATA[Picking five out of the sea of electronic discovery cases isn&#8217;t as easy as it sounds.  Sure, a few, like our &#8220;Case of the Year&#8221; will be no-brainers, but others aren&#8217;t as clear cut.  And, they&#8217;re certainly open to debate.  But, in my humble opinion here&#8217;s THE list, counting down David Letterman style: 5) Mancia [...]]]></description>
			<content:encoded><![CDATA[<p><strong> </strong></p>
<p><img class="alignnone size-full wp-image-267" title="top5-4" src="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/12/top5-4.jpg" alt="" width="210" height="276" />Picking five out of the sea of <a title="electronic discovery, e-discovery, ediscovery, legal discovery" href="http://www.clearwellsystems.com/e-discovery-central/index.php" target="_blank">electronic discovery</a> cases isn&#8217;t as easy as it sounds.  Sure, a few, like our &#8220;Case of the Year&#8221; will be no-brainers, but others aren&#8217;t as clear cut.  And, they&#8217;re certainly open to debate.  But, in my humble opinion here&#8217;s THE list, counting down David Letterman style:</p>
<p><strong>5) <em>Mancia v. Mayflower Textile Servs. Co</em>., 2008 WL 4595175 (D. Md. Oct. 15, 2008)</strong></p>
<p>If there ever was an opinion written by a judge to make a larger societal point, <em>Mancia</em> was certainly it.  Judge Paul Grimm, who&#8217;ll appear on this list in another slot as well, has clearly taken the mantle from Judge Scheindlin as the leading electronic discovery jurist.  He&#8217;d heretofore authored a number of significant opinions in this area, including <em>Hobson</em> and <em>Thompson. </em>Now, in<em> Mancia</em><em> </em>he used a garden variety discovery dispute, which was typically rife with boilerplate objections and other obstreperous tactics, to highlight the <a href="http://www.clearwellsystems.com/e-discovery-blog/2008/11/17/the-sedona-cooperation-proclamation-and-the-case-for-collaboration/" target="_blank">Sedona Conference&#8217;s Cooperation Proclamation</a>.</p>
<p>The lasting takeaway from the opinion is the notion that &#8220;[c]ourts repeatedly have noted the need for attorneys to work cooperatively to conduct electronic discovery, and sanctioned lawyers and parties for failing to do so.&#8221; To support this notion he cites the <a href="http://www.thesedonaconference.org/" target="_blank">Sedona Conference</a> Proclamation and the little used FRCP 26(g).  This opinion is noteworthy because it gives precedent to bolster the Sedona initiative and should provide a ready citation for all those counsel who aren&#8217;t getting the level of cooperation they need from the opposition.  It remains to be seen if other judges will follow suit, but this could be the beachhead for a more cooperative electronic discovery process in 2009 and beyond.</p>
<p><strong>4) <strong><em>Flagg v. City of Detroit</em>, 252 F.R.D. 346 (E.D. Mich. 2008)</strong></strong><em> </em></p>
<p><em>Flagg</em> highlights the growing need to reconcile the electronic discovery landscape, which typically focuses somewhat myopically on email, with the larger informational trends which are now categorized by the use of blogs, social networking sites, instant messaging, and text messaging.  <em>Flagg</em> was one of the first to determine text messages (e.g., messages exchanged among certain officials and employees of the City of Detroit via city-issued text messaging devices) were discoverable under the standards of FRCP 26(b)(1).  The holding further demonstrated the challenges of conducting electronic discovery across information systems that mix personal information with business communications.  This type of information commingling will continue to escalate, causing significant long term electronic discovery challenges due to thorny privacy, privilege and policy implications.</p>
<p><strong>3) <strong><em>Rhoads Indus., Inc. v. Bldg. Materials Corp. of Am</em></strong><em>.</em>, 2008 WL 4916026 (E.D. Pa. Nov. 14, 2008) </strong></p>
<p><em>Rhoads</em> is one of the first cases post Federal Rule of Evidence (FRE) 502, which recently created a national standard (versus the previous split in jurisdictions) and now states a &#8220;middle ground&#8221; for the determining of inadvertent disclosure during electronic discovery.  The key provision is (b)(2) which provides protection only if &#8220;the holder of the privilege or protection took reasonable steps to prevent disclosure.&#8221;  So, <em>Rhoads</em> took that &#8220;reasonableness&#8221; question head on in a scenario where the plaintiff Rhoads admittedly (yet inadvertently) produced over eight hundred privileged, electronic documents.  The decision is significant because it used the five-factor test stated in <em>Fidelity,</em> but put an undue weighting on the final test which was: &#8220;whether the overriding interests of justice would be served by relieving the party of its errors.&#8221;   This approach potentially threatens the development of sound case law that will be necessary to help the deployment of FRE 502 into practice because it casts too much uncertainty with its weighting of &#8220;fairness&#8221; (a problematically vague notion) in the analysis.  It will be interesting to see if/how this approach is subsequently adopted as we enter the New Year.</p>
<p><strong>2) <strong><em>Qualcomm Inc. v. Broadcom Corp</em>., 2008 WL 66932 (S.D.  Cal. Jan. 7,  2008)</strong></strong></p>
<p><em> </em></p>
<p>This  for <a href="http://ralphlosey.wordpress.com/2008/12/15/krolls-report-and-analysis-of-the-most-significant-e-discovery-cases-in-2008/" target="_blank">many</a> was the case of the year given it&#8217;s far reaching implications for the legal  community.  Some have argued that this isn&#8217;t an e-discovery abuse case per se,  but more of an example of discovery abuses that just so happened to be centered  around ESI.  In either case, the fraud, resulting cover-up, sanctions, ethical  issues and privilege discussions made for insightful and thought provoking  reading throughout 2008.  The lasting takeaway from <em>Qualcomm</em> appears to be the implications of  not just committing discovery abuses, but the failure of having a well thought  out e-discovery plan that is actively executed/monitored by outside counsel.   The resulting tension between outside counsel, inside counsel and the internal  IT department may continue to escalate if more cases like this make the  headlines in 2009.<strong></strong></p>
<p><strong>1)  E-Discovery Case of the Year: <em><a href="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/06/victorstanleymomay29_08final.pdf" target="_blank">Victor Stanley, Inc. v. Creative Pipe, Inc.</a>,</em> 2008 WL 2221841 (D. Md. May 29, 2008)</strong></p>
<p>Judge Grimm&#8217;s hallmark opinion has had the legal community buzzing over the past several months and the reason appears pretty straight forward.  In <em>Victor Stanley </em>Grimm builds on the holdings in <em>Seroquel, O&#8217;Keefe </em>and <em>Equity Analytics</em>, to boldly cast doubt on a practice so routine that it&#8217;s literally shocked the legal community into reevaluation:<br />
<em><br />
<em>(&#8220;[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) . . . .&#8221;</em></em></p>
<p>The notion that electronic discovery search is beyond the ability of most attorneys has caused tremors within the litigation support community who had a long history of blindly receiving keywords from counsel, running them and turning back over the results &#8211; often blissfully unaware of the extent to which those keyword searches actually located relevant information.  <em>Victor Stanley</em>&#8216;s analysis of the &#8220;reasonableness&#8221; of search protocols also has impact on the FRE 502 and therefore cements its place alongside other e-discovery &#8220;must reads&#8221; such as <em>Zubulake</em> and <em>Morgan Stanley</em>.</p>
<p>The cases above are my Top 5.  What additional cases do you think were important?  Please let me know by commenting on the cases you think shaped electronic discovery in 2008 and why.</p>
<p>Learn More On: <a href="http://www.clearwellsystems.com/e-discovery-101/frcp-electronic-discovery.php">Frcp Electronic discovery.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.clearwellsystems.com/e-discovery-blog/2008/12/12/top-5-cases-that-shaped-electronic-discovery-in-2008/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Demystifying Concept Search in Electronic Discovery</title>
		<link>http://www.clearwellsystems.com/e-discovery-blog/2008/10/28/demystifying-concept-search-in-electronic-discovery/</link>
		<comments>http://www.clearwellsystems.com/e-discovery-blog/2008/10/28/demystifying-concept-search-in-electronic-discovery/#comments</comments>
		<pubDate>Wed, 29 Oct 2008 03:50:12 +0000</pubDate>
		<dc:creator>Will Uppington</dc:creator>
				<category><![CDATA[concept categorization]]></category>
		<category><![CDATA[concept search]]></category>
		<category><![CDATA[e-discovery]]></category>
		<category><![CDATA[e-discovery services]]></category>
		<category><![CDATA[e-discovery software]]></category>
		<category><![CDATA[ediscovery]]></category>
		<category><![CDATA[electronic data discovery]]></category>
		<category><![CDATA[Judge Grimm]]></category>
		<category><![CDATA[keyword search]]></category>
		<category><![CDATA[legal discovery]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[Sedona Conference]]></category>
		<category><![CDATA[Victor Stanley]]></category>
		<category><![CDATA[Bayesian classification]]></category>
		<category><![CDATA[discovery]]></category>
		<category><![CDATA[documents]]></category>
		<category><![CDATA[ediscovery software]]></category>
		<category><![CDATA[Judge Facciola]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[Sedona]]></category>
		<category><![CDATA[Sedona Working Group]]></category>

		<guid isPermaLink="false">http://www.clearwellsystems.com/e-discovery-blog/?p=198</guid>
		<description><![CDATA[Concept or content search continues to be a hot topic within the e-discovery community.  There&#8217;s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/10/mist.jpg"><img class="alignnone size-full wp-image-202" title="mist" src="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/10/mist.jpg" alt="" width="230" height="157" /></a>Concept or content search continues to be a hot topic within the e-discovery community.  There&#8217;s a continuous stream of articles that discuss it.  <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=1202422337225" target="_blank">Some</a> that point out the positive.  <a href="http://www.law.com/jsp/legaltechnology/pubArticleLT.jsp?id=900005509469" target="_blank">Others</a> that point out the limitations.  The courts have also gotten involved in the discussion.  <a href="http://www.clearwellsystems.com/e-discovery-blog/2008/06/16/%E2%80%9Cangels-tread%E2%80%9D-an-e-discovery-classic/" target="_blank">Judge Grimm</a> refers to concept search in e-discovery in <a href="http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2008/06/victorstanleymomay29_08final.pdf" target="_blank"><em>Victor Stanley, Inc. v. Creative Pipe, Inc.</em></a>, 2008 WL 2221841 (D. Md. May 29, 2008).  <a href="http://ralphlosey.wordpress.com/2007/06/10/keyword-searches-v-concept-searches/" target="_blank">Judge Facciola discusses</a> concept search in <em>Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority</em>, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in <a title="e-discovery, ediscovery, legal discovery, electronic discovery, electronic data discovery" href="http://www.clearwellsystems.com/e-discovery-central/index.php" target="_blank">e-discovery</a>.   It&#8217;s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.</p>
<p><strong>The &#8220;Concept Search is Concept Search&#8221; Myth</strong></p>
<p>The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: <strong>concept search</strong> and <strong>concept categorization</strong>.  It&#8217;s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.</p>
<p>So, what are the differences between concept search and concept categorization?  First, let&#8217;s start with concept search.  Concept search technologies find documents containing &#8220;concepts&#8221;.  I think that the Sedona Conference&#8217;s &#8220;<a href="http://www.thesedonaconference.org/dltForm?did=Best_Practices_Retrieval_Methods___revised_cover_and_preface.pdf" target="_blank">Best Practices Commentary on the Use of Search &amp; Information Retrieval Methods in E-Discovery</a>&#8220;, provides a good definition of &#8220;concept&#8221; when used in a search context: &#8220;the combination of [a] query term and the additional terms identified by the thesaurus.&#8221;  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.</p>
<p>Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not &#8220;find&#8221; documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona <a href="http://www.thesedonaconference.org/dltForm?did=Best_Practices_Retrieval_Methods___revised_cover_and_preface.pdf" target="_blank">commentary</a> has a good summary of these different technologies if you are interested in learning more.</p>
<p>As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It&#8217;s why it&#8217;s not surprising that a lot of the users of e-discovery services and software don&#8217;t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I&#8217;ll discuss this in my next blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.clearwellsystems.com/e-discovery-blog/2008/10/28/demystifying-concept-search-in-electronic-discovery/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

