Archive for the ‘keyword search’ Category

The Federal Rules of California

Thursday, September 17th, 2009

On of August 14, 2009, the California Judicial Counsel amended their Rules of Court to augment discussion of electronic discovery issues during the meet and confer process.

Rule of Court 3.724 was amended to require discussion of “Any issues relating to the discovery of electronically stored information” no later than 30 calendar days before the date set for the initial case management conference.  The broad language (i.e., “any”) was augmented by eight specific categories that must be expressly discussed:

(A) Issues relating to the preservation of discoverable electronically stored information;

(B) The form or forms in which information will be produced;

(C) The time within which the information will be produced;

(D) The scope of discovery of the information;

(E) The method for asserting or preserving claims of privilege or attorney work product, including whether such claims may be asserted after production;

(F) The method for asserting or preserving the confidentiality, privacy, trade secrets, or proprietary status of information relating to a party or person not a party to the civil proceedings;

(G) How the cost of production of electronically stored information is to be allocated among the parties;

(H) Any other issues relating to the discovery of electronically stored information, including developing a proposed plan relating to the discovery of the information;

Many of these issues track FRCP language (including forms of production, preservation, privilege issues, etc.).  However, section G seems somewhat novel given the historical “American Rule” where the producing party is required to bear all necessary costs of production.

Curiously missing, in comparison with FRCP 26 B(2)(b), is the need to discuss the handling of “inaccessible” ESI, although this could easily be subsumed in the “any other issues” language of section H.  Also missing is a discussion about proposed searching and/culling protocols (aka “keyword negotiations”) which are often part of the core meet and confer topics in Federal court.

Nevertheless, the scope is broad enough to require *a* discussion of all likely relevant electronic discovery issues, which was often lacking historically.  Once that discussion starts, reasonably savvy counsel should be able to flesh out most of the significant issues.  And, given this broad language a judge would presumably give them a hard time for any material omissions.

Top 5 Cases That Shaped Electronic Discovery in 2008

Friday, December 12th, 2008

Picking five out of the sea of electronic discovery cases isn’t as easy as it sounds.  Sure, a few, like our “Case of the Year” will be no-brainers, but others aren’t as clear cut.  And, they’re certainly open to debate.  But, in my humble opinion here’s THE list, counting down David Letterman style:

5) Mancia v. Mayflower Textile Servs. Co., 2008 WL 4595175 (D. Md. Oct. 15, 2008)

If there ever was an opinion written by a judge to make a larger societal point, Mancia was certainly it.  Judge Paul Grimm, who’ll appear on this list in another slot as well, has clearly taken the mantle from Judge Scheindlin as the leading electronic discovery jurist.  He’d heretofore authored a number of significant opinions in this area, including Hobson and Thompson. Now, in Mancia he used a garden variety discovery dispute, which was typically rife with boilerplate objections and other obstreperous tactics, to highlight the Sedona Conference’s Cooperation Proclamation.

The lasting takeaway from the opinion is the notion that “[c]ourts repeatedly have noted the need for attorneys to work cooperatively to conduct electronic discovery, and sanctioned lawyers and parties for failing to do so.” To support this notion he cites the Sedona Conference Proclamation and the little used FRCP 26(g).  This opinion is noteworthy because it gives precedent to bolster the Sedona initiative and should provide a ready citation for all those counsel who aren’t getting the level of cooperation they need from the opposition.  It remains to be seen if other judges will follow suit, but this could be the beachhead for a more cooperative electronic discovery process in 2009 and beyond.

4) Flagg v. City of Detroit, 252 F.R.D. 346 (E.D. Mich. 2008)

Flagg highlights the growing need to reconcile the electronic discovery landscape, which typically focuses somewhat myopically on email, with the larger informational trends which are now categorized by the use of blogs, social networking sites, instant messaging, and text messaging.  Flagg was one of the first to determine text messages (e.g., messages exchanged among certain officials and employees of the City of Detroit via city-issued text messaging devices) were discoverable under the standards of FRCP 26(b)(1).  The holding further demonstrated the challenges of conducting electronic discovery across information systems that mix personal information with business communications.  This type of information commingling will continue to escalate, causing significant long term electronic discovery challenges due to thorny privacy, privilege and policy implications.

3) Rhoads Indus., Inc. v. Bldg. Materials Corp. of Am., 2008 WL 4916026 (E.D. Pa. Nov. 14, 2008)

Rhoads is one of the first cases post Federal Rule of Evidence (FRE) 502, which recently created a national standard (versus the previous split in jurisdictions) and now states a “middle ground” for the determining of inadvertent disclosure during electronic discovery.  The key provision is (b)(2) which provides protection only if “the holder of the privilege or protection took reasonable steps to prevent disclosure.”  So, Rhoads took that “reasonableness” question head on in a scenario where the plaintiff Rhoads admittedly (yet inadvertently) produced over eight hundred privileged, electronic documents.  The decision is significant because it used the five-factor test stated in Fidelity, but put an undue weighting on the final test which was: “whether the overriding interests of justice would be served by relieving the party of its errors.”   This approach potentially threatens the development of sound case law that will be necessary to help the deployment of FRE 502 into practice because it casts too much uncertainty with its weighting of “fairness” (a problematically vague notion) in the analysis.  It will be interesting to see if/how this approach is subsequently adopted as we enter the New Year.

2) Qualcomm Inc. v. Broadcom Corp., 2008 WL 66932 (S.D. Cal. Jan. 7, 2008)

This for many was the case of the year given it’s far reaching implications for the legal community.  Some have argued that this isn’t an e-discovery abuse case per se, but more of an example of discovery abuses that just so happened to be centered around ESI.  In either case, the fraud, resulting cover-up, sanctions, ethical issues and privilege discussions made for insightful and thought provoking reading throughout 2008.  The lasting takeaway from Qualcomm appears to be the implications of not just committing discovery abuses, but the failure of having a well thought out e-discovery plan that is actively executed/monitored by outside counsel.  The resulting tension between outside counsel, inside counsel and the internal IT department may continue to escalate if more cases like this make the headlines in 2009.

1)  E-Discovery Case of the Year: Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008)

Judge Grimm’s hallmark opinion has had the legal community buzzing over the past several months and the reason appears pretty straight forward.  In Victor Stanley Grimm builds on the holdings in Seroquel, O’Keefe and Equity Analytics, to boldly cast doubt on a practice so routine that it’s literally shocked the legal community into reevaluation:

(”[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) . . . .”

The notion that electronic discovery search is beyond the ability of most attorneys has caused tremors within the litigation support community who had a long history of blindly receiving keywords from counsel, running them and turning back over the results – often blissfully unaware of the extent to which those keyword searches actually located relevant information.  Victor Stanley’s analysis of the “reasonableness” of search protocols also has impact on the FRE 502 and therefore cements its place alongside other e-discovery “must reads” such as Zubulake and Morgan Stanley.

The cases above are my Top 5.  What additional cases do you think were important?  Please let me know by commenting on the cases you think shaped electronic discovery in 2008 and why.

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in e-discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

Demystifying Concept Search in Electronic Discovery

Tuesday, October 28th, 2008

Concept or content search continues to be a hot topic within the e-discovery community.  There’s a continuous stream of articles that discuss it.  Some that point out the positive.  Others that point out the limitations.  The courts have also gotten involved in the discussion.  Judge Grimm refers to concept search in e-discovery in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  Judge Facciola discusses concept search in Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 242 F.R.D. 139 and other opinions.  Despite (or maybe because of) all the commentary on this topic, I find that while a lot of people think that concept search in e-discovery is good, many are not fully sure of exactly what concept search is, and how it is practically useful in e-discovery.   It’s pretty clear that after several years of commentary and hype, concept search has become something of a buzzword associated with many myths and misconceptions.  In an effort to better understand what concept search is and how it can help in e-discovery, I want to dispel two of the most common myths I have heard.

The “Concept Search is Concept Search” Myth

The first myth around concept search actually revolves around what it is.  In my experience, people tend to lump two different technologies together when talking about concept search: concept search and concept categorization.  It’s very common, for example, to see commentators say concept search even when what they are really talking about is concept categorization.  To make matters more confusing, people also use a plethora of other names including content search, content clustering or concept clustering when what they really mean is concept categorization.

So, what are the differences between concept search and concept categorization?  First, let’s start with concept search.  Concept search technologies find documents containing “concepts”.  I think that the Sedona Conference’s “Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery“, provides a good definition of “concept” when used in a search context: “the combination of [a] query term and the additional terms identified by the thesaurus.”  In other words, concept search technologies find documents containing a specified term plus additional terms with similar meanings derived from a thesaurus.

Concept categorization, on the other hand, is actually not a search technology at all.  Concept categorization technologies do not “find” documents.  Rather, they categorize or group documents based on their similarity.   There are many different ways to group documents based on similarity.  Techniques include statistical (which assesses similarity based on word frequency), Bayesian classification (which weights words differently depending on factors in addition to statistical frequency, such as where the terms appear in a document), and semantic indexing (which takes into account the fact that many words used in a similar context may have a similar meaning).  It would take more time to describe these technologies in detail but the Sedona commentary has a good summary of these different technologies if you are interested in learning more.

As should now be apparent, these technologies are very different and using the same words to describe them is confusing.  It’s why it’s not surprising that a lot of the users of e-discovery services and software don’t have a strong understanding of what these technologies are or what benefits they can actually provide in practice.  Dispelling the myth that they can be lumped together is a critical first step in any conversation about concept search and how it can help in e-discovery.  This leads us to a second myth, that Concept Search is better than Keyword Search.  I’ll discuss this in my next blog post.

The “Artful” E-Discovery Dodger

Monday, October 13th, 2008

E-Discovery search has become a hot topic of late (in blogs and in the news), and I think it’s pretty clear that the unwashed (attorney) masses still don’t really grok the importance of using a defensible search protocol.  Neither do they seem to understand the enhanced scrutiny that’s being applied by the judiciary.

Kipperman v. Onex Corp., 2008 WL 4372005 (N.D. Ga. Sept. 19, 2008) is another in what will assuredly be a long string of cases that demonstrate how easy it is for litigators to get wrapped around the axel of e-discovery search.  In Kipperman, the defendant (Onex) presented several motions to the court, including attempts to obtain relief from the need to produce email identified after searching several backup tapes.

During a previous hearing the court ordered Onex to search all the mailboxes on two tapes, as well as on an additional tape selected by Plaintiff. The court determined that despite Onex’s objections and representations, the backup tapes were “producing meaningful discoverable information.”  The court was nevertheless sympathetic to Onex’s burden and therefore weighed in with some guidance:

“The court did suggest, … , that Plaintiff be more artful with its search terms and that Plaintiff utilize a list of the people, provided by Defendants, to review whether all mailboxes needed to be searched.”

The court also gave Onex the chance to narrow the search terms.  Unfortunately, they didn’t seize the opportunity to provide a narrower list or a refinement of their search terms.  “As such, they agreed to search and restore all the mailboxes with the search terms provided by Plaintiff.”

Not surprisingly, Onex then sought relief from having to review and produce all of the results from the search because the “broad search terms resulted in thousands and thousands of irrelevant hits.”  For example, the search terms included the word “republic” which used to elicit emails regarding Republic Builders Products, one of the companies involved in this matter.

“Defendants claim that the search captured thousands of irrelevant pages due to one occurrence of the word ‘republic’ often related to Onex business interests having nothing to do with Magnatrax in the ‘Republic of France,’ ‘Republic of Ireland,’ and ‘Czech Republic’.”

Again the court reaffirmed their sympathy with Onex’s burden and yet denied the requested relief, in large part because Onex was warned about not being more “artful”:

“[T]he court is not unsympathetic to the massive amount of discovery involved in this matter, the considerable burden of working with it, and the overproduction that often comes with e-mail production. Therefore, the court gave Defendants numerous tools by which to reduce the burden of e-mail discovery, including an opportunity to limit Plaintiff’s search terms and an opportunity to provide a list by which the number of peoples and the number of boxes being searched could be reduced. Defendants did not take advantage of these opportunities. Defendants must now lie in the bed that they have made. Thus, Defendants’ objections on the basis of relevancy and volume are DENIED.” (emphasis added).

Needless to say, Kipperman is probably not all that atypical.  Attorneys everywhere have historically used blunt e-discovery search instruments and haven’t often run afoul of the judiciary.  Now, post Victor Stanley, et al, the playing field has changed dramatically.  It’s important to leverage best practices (from Sedona and others), craft a defensible search strategy, sample the results and “show your work.”  Missteps along the way, especially ones that the court has tried to help the parties avoid won’t be met with much tolerance

“I Missed The Boat”

Thursday, March 15th, 2007

The other day, I heard that a local technology company had a lot of pain around ediscovery. Within hours, one of our board members had contacted the General Counsel who informed us that they had just purchased a product for eDiscovery 3 weeks prior.

That evening over dinner, I summarized the situation by saying to my wife that “I missed the boat.” My 3-year-old son, who was also at the table, immediately started to quiz me: “You missed the boat? The boat left without you? You were late so the boat had to go? There wasn’t room on the boat for you?” For days afterwards, when I left for work, he would ask me: “Are you going on the boat today?”

The thing that really struck me is that we often speak in metaphors, and it is not just 3 year olds who have trouble understanding. One of our customers is a large manufacturing company. On deploying Clearwell to analyze its email, the company discovered a large number of messages with the expression: “The eagle has landed”. That struck them as rather odd, so they investigated and discovered a group of employees were illegally selling company equipment on the grey market and every time they made a sale, they would let those involved know by sending out an email saying “The eagle has landed.” Viewed alone, the expression looks innocent enough; once viewed in the context of emails flowing back and forth, it was clearly a statement of guilt.

Examples like this illustrate why simple keyword search is not enough. Companies need more sophisticated tools which, among other things, group emails into topic areas and link them into discussion threads, to surface coded expressions and analyze them in context. To rely on keyword search alone would be like someone at Proctor and Gamble seeking to analyze point-of-sale data from Walmart with a pocket calculator.

Of course, that doesn’t help me explain missing the boat to my son. So over the weekend, we took him on the ferry between San Francisco and Sausalito – just so he knows I don’t miss the boat every time.