Archive for the ‘Transparent Search’ Category

The Business Strategy Behind Clearwell’s Transparent Concept Search

Monday, January 31st, 2011

Last fall, when Transparent Concept Search was still in development, we showed an early version of it to a group of our customers. Their excitement was palpable, and they spent most of our session together comparing notes about the varied ways they will use it. But at the end of the discussion, one of them asked the question which was on everyone’s mind: “how much will you charge for it?”, or as someone else immediately said “I get charged $200/GB for plain vanilla concept search, so how much of a premium do you think you will get for this?”

Our answer surprised them: there’s no charge. Transparent Concept Search is included in Clearwell for free. Here’s why doing that makes sense:

There are two business strategies in the technology industry which are proven to work. One is to be the low-cost provider and compete on price. These companies, such as Chinese PC manufacturers, do not spend anything on R&D or marketing. Instead, they ruthlessly squeeze out cost savings and pass them on to their customers. The other proven strategy is to be the innovation-leader, whereby you continually delight customers by giving them more and more functionality at the existing price. Players following this strategy are never the cheapest, since they charge a little extra to fund new product development. For example, iPhone is by no means the cheapest smart phone, but its price did not go up when, with the iPhone 4, Apple added video, a forward-facing camera, better battery life, and a retina display.

It is worth noting that either strategy can work, and companies sometimes move between the two, although making that transition is incredibly hard. Staying in the PC industry, Dell started as the low cost provider, but has more recently tried to move up the value chain by investing more in the design of its products. The results, so far, have been mixed.

At Clearwell, our strategy is to be the innovation leader in e-discovery software. We tackle really hard technical problems, solve them in innovative ways, and then seek to delight our users by providing them with breakout, new capabilities at no incremental cost. Transparent Concept Search is a perfect example of this.

Rather than just integrate with concept analysis plug-ins, as pretty much every review platform does, we asked ourselves: if we were to create concept search from scratch specifically for e-discovery, what would we build? As part of that process, we tapped into the latest academic research in semantic analysis coming out of UCLA, University of Pittsburgh, and other universities, and discovered that it offers a solution to the biggest single problem users have with concept search: the heavy computational burden traditional approaches require. By using a variation of the semantic space model which is explained in that new research rather than, say, latent semantic indexing, we can deliver concept searching to much larger legal matters.

Beyond the core technology, we also wanted to change the user experience, by bringing the same level of visibility and control that our users enjoy in keyword search to this domain. Our goal is to enable users to balance both precision and recall in a way that was not previously possible. The result – Transparent Concept Search – is completely seamless within Clearwell in a way that simply cannot be matched by concept search plug-ins to a review platform, which are essentially two separate products from two separate vendors. In summary, it’s a vastly superior user experience – at no incremental cost.

This is the first of many things you will see from us this year. Our team could not be more excited about the new products and ideas that we have in the pipeline.

Concept Search in E-Discovery: From Concept to Reality

Sunday, January 30th, 2011

For years, concept search in electronic discovery has been like concept cars at auto shows: Cool. Slick. The thing that everyone is talking about.

But not ready to move to the assembly line and be put into production.

Like a concept car, concept search has been based on a lot of good ideas and shown a lot of promise. However, it has failed to move beyond a few edge use cases and reach mass adoption in the e-discovery market.  Why is this the case?

It’s not been because it’s an unproven idea or that the basic technology hasn’t been available. In fact, the core algorithm that underlies most existing concept search technologies has actually been around since 1988, when latent semantic analysis (LSA) was first patented by a team from Bell Labs. Over the last 20 years, dozens if not hundreds of companies have sprung up to apply concept search to the broad area of enterprise search and to e-discovery in particular.

To understand why concept search has never taken off, it’s always interesting to look for parallels, and the parallel du jour is social networking. Readers of David Kirkpatrick’s excellent book The Facebook Effect and (perhaps to a lesser, more fictionalized extent) viewers of the movie The Social Network understand that Facebook was far from the first social networking site (remember MySpace? You won’t admit it, but I know you do). But, despite being several years late to the party, Facebook somehow took the core of the social networking idea and presented it to users in a way that really allowed it to “cross the chasm” to the mainstream market.

In introducing Transparent Concept Search, Clearwell plans to help conceptual search cross that same chasm in e-discovery.  In talking to customers over the last couple of years, we have found that there are unmet customer needs with existing concept search products that, once addressed, will really allow its use in e-discovery to flourish – and not just in a way that makes concept search marginally more useful, but, a la Facebook, makes it orders of magnitude more useful.

What are these unmet customer needs?

Ease of use: Historically, concept search has been relatively easy to use in the strictest sense of the word – you type in some terms that represent your concept, and you get a set of search results back, along with some related terms and/or clusters of related documents. Simple, right? The issue is that in most cases that’s not what the user really wants to do. Because concept search is inherently “fuzzy”, users want to be able to refine their concept based on the feedback that they got from their initial search. Concept search, just like keyword search, is an iterative process, and prior-generation technologies have not allowed for that form of iteration. In contrast, Clearwell’s Transparent Concept Search allows concepts to be defined and refined in a way that is intuitive, visual, and (don’t take my word for it, but try it for yourself) fun.

Precision: Traditional concept search increased recall when compared to just keyword search, but it came at the cost of precision. The refinement process facilitated by Clearwell’s Transparent Search addresses this issue by allowing intelligent human input to guide the concept search process. You get the best of both the recall and precision worlds with vastly diminished time and effort.

Defensibility: Even more important than ease of use and precision is defensibility. Defensibility, for those new to the term, isn’t so much about whether the way the algorithms work is known and able to be understood. They are, and aren’t that complicated. Rather, defensibility is about reasonableness: was the concept search a reasonable way of determining which documents are responsive? Without the ability to define your concept in an interactive manner, we believe that the answer has historically been “no”, making concept search nice in theory but unusable in actual legal practice. Transparent Concept Search promises to change that. The end result is a more defensible search process that yields both greater recall and greater precision, enabling users to more quickly analyze case facts, rapidly identify key documents that may have been missed, eliminate irrelevant documents, and prioritize the most relevant documents for review. Clearwell also provides a reporting and auditing feature to document your search, allowing you to improve defensibility by “proving up” what was done.

Low cost: Finally, never underestimate the value of “free” in helping meet the ever-important unmet need of cost predictability and control. Historically, vendors have charged price premiums (often substantial) for concept search. Trying to charge a premium in e-discovery for something that doesn’t fully meet the customer use case and isn’t defensible, and it’s a recipe for low adoption. However, provide a highly useable, effective, and defensible capability as part of the core functionality of today’s leading e-discovery platform, and it starts to look very attractive indeed.

Hopefully you can tell that we’re incredible excited about the promise that this technology holds for the market, and this initial version is really just the beginning. Want to see it for yourself? Check out the video below, visit our web site or, if you are in New York this week, please visit us at LegalTech New York – we would love to see you.

Why Transparent Search In E-Discovery Is The Answer To Victor Stanley

Tuesday, August 26th, 2008

In my last post, I discussed how the “black box” design of enterprise search engines makes it challenging to defensibly use keyword search in e-discovery and follow Judge Grimm’s guidance in Victor Stanley, Inc. v. Creative Pipe, Inc., 2008 WL 2221841 (D. Md. May 29, 2008).  In Victor Stanley, Judge Grimm notes that because keyword search technology is prone to producing over- and under-inclusive results, attorneys using keyword search should adopt one of two approaches: either collaborate with the opposing party to agree on keyword search methodology, or utilize best practices that demonstrate they have taken reasonable measures to reduce over- and under-inclusiveness.  However, the black box search technologies that are used in e-discovery today make following this guidance difficult.  They can’t reduce under-inclusiveness without increasing over-inclusiveness.  And they make it expensive to utilize collaborative or best practices methodologies including testing, sampling, refining and documenting searches.  All of which begs an obvious question: what can be done to improve search for e-discovery?

In my opinion, the answer is simple: e-discovery search needs to become more transparent.  Instead of being forced to feed one search query at a time into a “black box” search engine and then getting results  with no idea how those results were generated, lawyers and litigation support professionals need technology that provides them with greater visibility into the search process. They need to understand how the results were obtained, so they can reduce both the over- and under-inclusiveness of keyword search, and easily follow Judge Grimm’s advice to improve the defensibility of their search methodology.

A transparent search solution should have four key elements:

  1. Transparent query expansionQuery expansion is the process by which search engines take the query that the user submitted and expand or convert it into a new and improved form.  Wildcard, stemming, concept and fuzzy searches all follow this query expansion process.  For example, the search “divers*,” would be expanded to search for all the words that start with “divers” in the data set, such as “diverse,” “diversity,” “diversion,” “diversification,” etc.  In transparent search, query expansion would be exposed to users, allowing them to include or exclude expanded keywords. To continue with the previous example, a user that is searching for documents related to diversity would then have the ability to exclude false positive expanded terms, such as “divers”, “diversion,” and “diversification” from the search.  Making query expansion transparent can significantly reduce the over-inclusiveness of keyword search.  It also makes it practical to use technologies, such as concept and fuzzy search, that have not been used to date because of their complexity and tendency to produce massively over-inclusive results.
  2. Multiple query support. When a search contains multiple keyword queries, such as “hiring” and “interview,” transparent search should provide visibility into the results for each individual query as well as the combination of all the queries. For example, with the search “hiring OR interview,” users should have separate visibility into the results for “hiring” and “interview” as well as “hiring OR interview.”  They should know that out of the 100 documents that match “hiring OR interview”, only 5 match interview and 95 match hiring.  This kind of visibility is critical if you want to either collaborate or follow search testing, sampling, and refinement best practices when there are a large number of queries.
  3. Rapid sampling. Transparent search should support the ability to rapidly sample the results from all of the individual queries, such as “hiring” and “interview”, contained within a search. It should also be easy to take a random sample of non-matching documents in order to assess whether one or more searches have identified as many of the relevant documents as possible.  As Judge Grimm states in Victor Stanley when assessing keyword searches used to find privileged documents, “The only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive.”
  4. Automated documentation. Transparent search technology needs to document all aspects of the search process including (but not limited to) any keyword that has been excluded during transparent query expansion, the combined results of a search containing multiple individual queries, and the results for each of the individual queries within that search.  Automatically documenting the search methodology used and the results obtained is critical so that users can “show their work” if their search methodology is ever called into question.

Benefits of Transparent Search

By addressing the main technology challenges of keyword search, transparent search provides significant benefits to attorneys and litigation support professionals using search for e-discovery. First, parties that adopt transparent search can improve the defensibility of their e-discovery search practices. By enabling iterative testing, sampling and refinement, transparent search allows users to adopt the approaches recommended by Judge Grimm when it was previously impractical to do so.  At the end of the day, this means less risk.

Second, the use of transparent search can substantially reduce downstream production and review costs by removing false positives. For example, it is not uncommon for certain wildcard searches to generate results where 20-40% of the included documents are false positives that can be removed by transparent query expansion.  This can result in thousands of dollars of savings on a single search query.

Finally, transparent search can dramatically reduce the time and cost required to complete the search and culling stage of e-discovery. Currently, it can take hundreds of hours to run a significant number of searches one at a time, document the results of each search, and sample and refine each individual query. With transparent search, running multiple queries and documenting each of the individual results takes minutes. Sampling each of the individual queries takes seconds.

When it comes to e-discovery search, it’s important to recognize that there are no “silver bullets.”  Search will remain an imperfect science with the possibility of over- and under-inclusive results.  But equally, there is no doubt that search remains the best solution for reducing the vast quantities of electronic information that are a part of every e-discovery process down to a reasonable level for human review. While attorneys and litigation support professionals can’t completely remove the imperfections of keyword search, they can, with transparent search, take action to minimize the impact of these imperfections and defensibly meet the requirements of new case law.  In doing so, they will be able to turn their attention to where it should be: the substance of the case.