Posts Tagged ‘electronic’

Concept Search Versus Keyword Search in Electronic Discovery

Wednesday, November 12th, 2008

In my last post, I started a discussion on the myths surrounding concept search.  The first myth I dispelled was the “concept search is concept search” myth.  The myth is that there is an agreed upon definition of concept search.  In actuality, when people in e-discovery use the term concept search, they don’t always mean the same thing.  Frequently they are not actually talking about concept search technology at all and are actually talking about concept or content categorization technology, which is very different.  The second myth that needs dispelling is that concept search is better than keyword search.

The thinking behind this myth goes something like this:

Keyword search has a lot of problems.  It is prone to being over-inclusive, i.e., finding some non-relevant documents, and under-inclusive, i.e., not finding some relevant documents.  Concept search technologies are new and interesting and using these technologies you can find documents that keyword search can’t find.  Therefore, concept search must be better than keyword search.

Let’s examine this thinking.  The first two statements are accurate.  Keyword search is not perfect and can produce over- and under-inclusive results.  And concept search and content categorization technologies can both help identify documents that keyword search technologies might not find.  However, the conclusion that concept search is better than keyword search is not valid and doesn’t follow from these two statements.  Why?

In order to answer this question, we first need to go back to the difference between concept search and content categorization. Because these are different technologies, we really need to separately compare concept search versus keyword search and content categorization versus keyword search.  Let’s start with content categorization and keyword search.

The issue with this comparison is that keyword search and content categorization do different things.  Keyword search can be used in many ways in e-discovery.  The two most common are: (1) analysis or case assessment: finding the hot documents and understanding the matter by determining who knew what, when, how and why, etc., and (2) culling: removing non-responsive documents and/or identifying potentially privileged documents in order to reduce a large, starting set of documents to a smaller set before review.

Content categorization, on the other hand, has historically been used within the review phase of e-discovery.  Categorization can help reviewers to better understand the documents they are reviewing and thus potentially increase the speed of review.  Practitioners with whom I have worked also find that categorization can be useful during analysis by helping to understand a matter and identify potentially important keywords.

However, content categorization has not been used as part of culling.  First, culling needs to be transparent.  You need to be able to get agreement with or at least explain to the opposing side and the court exactly how you have culled the data set.  If you cull based on categories of documents that have been generated by a proprietary, black-box algorithm, it’s going to be difficult to gain agreement on or explain your culling methodology.  This is why the typical method of culling is still to use keyword search and either agree on the set of search terms with the opposing side or to use e-discovery search best practices to perform keyword searches on your own.

Second, content categorization has its own issues when it comes to being over- and under-inclusive.  There is no guarantee that your group of documents that have been categorized as being related to, for example, a company’s hiring policies include all of the documents in your matter related to hiring policies or that they do not include some documents that may not really be related to hiring policies.  Content categorization, like keyword search and virtually every information retrieval technology, is not perfect.

So what about concept search technology?  Surely, concept search technology is better than old, boring keyword search.  Well, actually it’s not that clear-cut.  The problem with concept search technology is that while it might find more relevant documents than plain keyword search, it will also likely find more false positives.  Imagine searching for documents containing “terminate” in an employment matter and your concept search technology automatically searching for “fire”, “dismiss”, etc. as well.  You’ll find more documents related to the termination of employees, but you’ll also find a lot more non-relevant documents concerning house fires, the fire department, etc.

So concept search can help address the under-inclusive problem with keyword search, (though it won’t solve it) and can be helpful during analysis.  But it can often increase the over-inclusive problem.  In addition, today’s concept search technologies share the transparency problem with concept categorization.  These technologies have largely been designed as “black boxes”, which as I have discussed in the past, makes sense for Enterprise search but not for e-discovery search, and, as a result, could also be potentially difficult to explain and defend.   For these reasons, concept search technology isn’t used very much in e-discovery today.  In order for its use to become widespread, it will need to become more transparent.  But that’s a topic for another day.

The bottom line here is that despite all the hype, concept search and content categorization technologies do not solve all the challenges of e-discovery search.  Both of these technologies can be very useful and the technology behind them is always improving.  However, as most of the experienced practitioners I work with already know, these technologies are generally better thought of as supplements to keyword search, not replacements.  The important question is not whether to use one technology over the other but which technology is best suited to your objectives and how best to use all the available technologies to achieve the desired goal.

FTI Consulting Acquires Attenex for $88 million

Wednesday, June 11th, 2008

lets-make-a-deal.jpgAssuming that you can buy each company for the same price, which would you acquire?

Company A has been in business 3 years, has 25 customers, no brand to speak of, and did about $5 million in revenue in the prior year; or,

Company B has been in business 7 years, has over 100 customers, a strong brand in its market, and is doing $25 million in annual revenue?

“No brainer,” you say, “obviously, Company B.” So it is that FTI looks to have got a great deal buying Attenex (Company B) today for $88 million, whereas Seagate looks like it grossly overpaid for Metalincs (Company A) which it bought for $82 million in December 2007. But things are not always as they appear, and there are good reasons why Attenex has sold for a paltry 3.5x revenue, a multiple well below the 16x commanded by Metalincs or even the 5x revenue that Iron Mountain paid for Stratify.

Three forces reduced Attenex’s acquisition price. The first is that FTI accounted for a large proportion of Attenex’s revenue. That gave FTI leverage over Attenex since it could say, “sell to us for $88 million, or we will take our business elsewhere, your revenue will plummet, and the value of your business will be greatly reduced.” This power that FTI had over Attenex made it the only logical acquirer, so there could be no pressure from other bidders to raise the purchase price.

The second force depressing Attenex’s valuation is that its revenue will likely decline post acquisition as Attenex’s partners (who compete with FTI) switch from Attenex to other solutions. Software investors value growth above all else – and are willing to pay up for it. For example, Bladelogic, an unprofitable software company, went public last year at a $500 million valuation with less trailing revenue than Attenex. But it did $62 million in revenue the following year (Bladelogic sold to BMC Software for $800 million in April 2008). Attenex, by contrast, will see declining revenue in the next 12 months.

Finally, acquirers worried that, since Attenex’s revenue comes almost entirely from its hosted offering via service providers, its revenue was more volatile than enterprise-oriented e-discovery software companies. This is due to the fact that customers (typically, law firms) purchase Attenex-powered services on a case-by-case basis and can switch away at any time. Enterprises, in contrast, purchase long-term software contracts that will not vary based on short-term changes in case volume.

Once these factors are taken into account, the price and the multiple start to look a lot better. Attenex’s founders, who are some of the pioneers of the e-discovery industry, get some well-earned liquidity; the venture investors make a decent return; and, employees get to join a professionally-run company that compensates its people well. My congratulations to the Attenex team, and to FTI which has negotiated a great deal.

Of course, all this says nothing about the deal’s impact on the broader e-discovery market. That will be the subject of my next post.