Posts Tagged ‘Recommind’

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

Recommind Publicly Discloses Its Revenue – And It’s Less Than You Might Think

Wednesday, October 13th, 2010

Working in the industry, I have a good sense for the annual revenue generated by most e-discovery software vendors. I do not write about it, because these companies are almost all private and prefer to keep their revenue numbers confidential. But when a private company publicly discloses its annual revenue, then the numbers are in the public domain and it’s reasonable to examine them.

That’s the situation with Recommind, which has chosen to disclose its revenue in an effort to draw attention to the company. Recommind participated in industry surveys conducted by Deloitte in 2009 and Inc Magazine in 2010, revealing its revenue for the years 2006, 2008, and 2009. The company then spoke to the451 Group, which published a report on August 25, 2010, stating: “[Recommind] predicts 60-90% overall growth for the year – potentially topping even 2008′s 70% growth rate”. If you apply these growth rates to the previously disclosed revenue numbers, it’s simple arithmetic to calculate Recommind’s revenue in 2007, and its revenue forecast for 2010. The data is collated in the table below:

Recommind Revenue 2006-2010

2006

2007

2008

2009

2010

Revenue

$4.6M

$8.5M

$14.4M

$14.7M

$23-28M

Annual Growth Rate

85%

70%

2%

60-90%

Source

Inc Magazine
(1)

451Group
(2)

Deloitte
(3)

Inc
Magazine
(1)

451Group
(2)

(1)     Inc Magazine’s 5000 List for 2010 ranked Recommind #1334
(2)     From the451 Group: “Recommind rallies with strong growth and more hosted e-discovery traction” by Nick Patience and Katey Wood, August 25, 2010
(3)   Deloitte’s 2009 Technology Fast 500 ranks Recommind #251

I found two things striking about these numbers. First, Recommind’s business clearly hit the wall in 2009, when it grew only 2% at a time when other e-discovery software companies like Clearwell, Exterro, kCura, and others were all growing at a rapid pace. Things appear to have turned around for the company since then, and the 60-90% annual growth expected in 2010 raises it firmly to the middle-tier of industry players.

Second, it’s interesting to note how small these numbers are. From an electronic discovery software perspective, they look even smaller when you consider that Recommind’s expected $23-28 million in 2010 revenue comes from 3 different markets:

1. Enterprise search: For example, the company recently announced that it has been selected by Staples to power search on Stapleslink.com

2. Knowledge management for law firms: This accounts for the bulk of customer testimonials on its website.

3. E-Discovery: Its Axcelerate product has been adopted by a handful of enterprises and law firms for processing, early case assessment (ECA), and review.

If you divide its revenue estimate for 2010 equally across these three markets, it suggests that Recommind’s e-discovery revenue in 2010 is only $7-10 million. That’s much less than many other companies in the space.

What can we take away from all this? First off, you have to wonder if it’s a good idea to disclose this information, since it exposes the fact that Recommind is a lot smaller than many people think. More generally, it’s an interesting case study in how, by selecting the right time period and calculating high percentage growth rates off small numbers, a company can gain recognition despite its small size and erratic growth.

Learn More On Litigation Software & Litigation Support Software.

Cutting Through The Confusion: A Buyer’s Guide To Electronic Discovery Software

Sunday, April 19th, 2009

Over the past 4 years, I have had hundreds of conversations with corporate counsel and “legal IT”, meaning technical folks charged with supporting the legal team. More and more of them are looking to lower their costs by bringing e-discovery in-house. But as they work through that process, there’s one question that consistently comes up, even today – namely, “When [insert name of software company] says they “do” e-discovery, what exactly does that mean?”

There has been progress towards answering this question, thanks mainly to the analyst community. George Socha and Tom Gelbmann’s EDRM framework has been immensely helpful in breaking down electronic discovery into its component steps. Other analysts, like Debra Logan at Gartner, were quick to embrace the framework, prompting every software provider to follow suit. As a result, there is today a common language that everyone uses to describe the e-discovery process.

The Electronic Discovery Reference Model (EDRM) breaks down the e-discovery process into a series of steps. Companies looking to buy e-discovery software to lower costs typically map different software products to each of these steps, to make sure that they cover the entire process.
The Electronic Discovery Reference Model (EDRM) breaks down the e-discovery process into a series of steps. Companies looking to buy e-discovery software to lower costs typically map different software products to each of these steps, to make sure that they cover the entire process.

But having a universally-agreed framework is only half the answer. To eliminate customer confusion, there also needs to be agreement on how different software products fit into the framework. This is especially important since there is no single, end-to-end solution for e-discovery which covers all aspects of EDRM. So customers are forced to think about how different software solutions fit together. And that is where things begin to fall apart.

Many software vendors feel it is advantageous to claim that they do everything, even though they do not. Customers are rightly suspicious of those claims, and so press vendors to provide more detailed information – hence the question, “when you say you do e-discovery, what exactly does that mean?”

In light of that, how can litigation support teams, corporate counsel, or legal IT people figure out which e-discovery solution best meets their needs? From observing this decision-making process hundreds of times, I have found 3 simple steps are incredibly helpful.

Step 1: Read the analyst reports

Two reports in particular make for required reading. One is Gartner’s MarketScope Report, which is available for free at certain sites; the other is the 451Group’s recent e-discovery report, which is summarized in a publicly available presentation. The helpful thing about the 451 Group’s report is that it tells you which software companies do which parts of the EDRM process. You do have to buy the report to get the full picture (it’s well worth it!), but the publicly available presentation will give you a flavor for their analyis, and I have drawn from that presentation in the figure below:

Analyst firms like the 451 Group map software vendors to the EDRM framework according to what they actually do, which is often different from what software vendors claim they do.
Analyst firms like the 451 Group map software vendors to the EDRM framework according to what they actually do, which is often different from what software vendors claim they do.

The 451 Group’s analysis highlights several important points. First, it shows that there is no single end-to-end solution. Even the products of giants like EMC (SourceOne), HP (IAP), and IBM (CommonStore) only solve one piece of the puzzle, information management. Second, it shows that customers have choices at each stage of the EDRM process. For example, to solve the problem of identification, collection, and preservation of electronic information, customers can choose from solutions as diverse as Guidance EnCase (forensic collection), Index Engines (back-up tapes) and Mimosa NearPoint (email archive). Third, it provides an independent assessment of what vendors do, as opposed to what they may claim. For example, Kazeon claims analysis and review capabilities, whereas the report shows its product does identification, collection, and preservation; Recommind claims its Axcelerate eDiscovery and MindServer products do processing, whereas the report finds that they do not.

Step 2: Evaluate the products prior to purchase

Just as anyone would test-drive a car prior to purchase, it’s critical to test-drive e-discovery software. Any vendor should be willing to provide their software free of charge for an evaluation on-premise. The most effective evaluations are when the customer uses the product themselves, either on a live case or test data. This is far preferable to just sending the data to the vendor who then loads it into their system, as in that scenario there are too many opportunities for the vendor to hide their product’s shortcomings.

Step 3: Check references carefully

The trick with references is to insist on relevant references. It’s not good enough for the vendor to dredge up some random person who says nice things; or even a credible knowledgeable person who is using the product in a completely different way. For example, if a company is happy with Autonomy’s IDOL for enterprise search, that does not tell you much about what Autonomy might be like for e-discovery. What really counts are references from other customers who are using the product for the same application that you are.

All this can sound like a lot of work, but I have seen people go through the process in as little as a month, and be much happier for it. A little work up front can save a lot of time (and heart-ache!) later on.