Posts Tagged ‘e-mail’

Social Media and eDiscovery: New Kid on the Block, but the Same Story

Friday, September 30th, 2011

In the eDiscovery universe, hot trends and evolving technologies tend to capture the attention of the legal community.  Discoverable data sources have been the focus in the courtroom for quite some time, and just like the “popular kids” from high school, email has held the crown of eDiscovery darling.  Not surprisingly, the more time end-users spend in a specific medium (on Facebook, for example), the more likely data will be created – and as that data multiplies, it has the potential to become compelling in discovery.  It seems that many U.S. organizations are electing to allow social media use at work and for work, rather than blocking access.  For obvious reasons, granting this access is culturally desirable, but from an eDiscovery perspective social media use introduces new complications.  However, don’t be mystified.  There is nothing that new here.

Recently, Symantec issued the findings of its second annual Information Retention and eDiscovery Survey, which examined how enterprises are coping with the tsunami of electronically stored information.  Having lost some popularity, email came in third place (58%) to files/documents (67%) and database/application data (61%) when respondents were asked what type of documents were most commonly part of an eDiscovery request.  The new kid on the block for data sources is social media, reported by 41% of those surveyed.  Social media is in essence no different than any other data type in the eDiscovery process, it’s just the newest.  Said another way; social media is the new email.

Of course, it’s no longer news to proclaim that communications from social networking sites are discoverable.  What is newsworthy is the question of how to effectively store, manage and discover these communications which come in such varying forms, making the logistics of doing so for social media different than for traditional mediums.  Like email, social media is used by everyone (ubiquitous), is viral (fast), has mixed uses (professional and personal) and there is a lot of it (high volume).  Unlike email, social media comes in many different forms (Facebook, LinkedIn, Twitter, etc.), is not controlled within an organization’s firewalls (custody, possession and control issues), and has more complex requirements within the information governance lifecycle (technology is needed to ingest social media into an archive).

The two main areas to examine in relation to social media use and an organization’s policies are: 1) the legal issues that apply specifically to the organization, and 2) the logistical and technical requirements for preservation and collection.  Essentially, what is the organization’s policy surrounding social media use, and how can the information be accessed if need be? Luckily, technology exists that is nimble enough to be able to ingest social media and archive it in accordance with an organization’s policy, should one exist.  Organizations that have recognized social media as the newest kid on the block have, ideally: developed a social media policy, purchased (or deployed) collection and retention technology, and instituted training for their employees.  They have also integrated social media into their information governance strategy and document retention policy. Remember, not all organizations will have to archive social media, but all should address social media with a policy and training.

Other organizations have not accepted social media as part of the evolutionary process of eDiscovery.  They proceed at their own peril – as did the organizations that did not control their email some ten years ago!

These organizations will be in crisis when they need to collect social media for litigation and will most likely have a large lesson in damage control, as well as an equally large bill.  They will be uneducated, ill-prepared and overwhelmed about how to discover social media.  Without a policy, they will have to over collect by default, which will drive up the costs for collection and possibly for downstream review.  Given that the aforementioned survey found nearly half of the respondents did not have an information retention policy in place, and of this group, only 30% were discussing how to do so, it is likely that many of these organizations do not yet have a social media policy either.

With this background in mind, organizations should evaluate which laws and regulations apply to their organization, develop a policy and train their employees on that policy.  Plus ça change, plus c’est la même chose.

For more information about how IT and Legal can manage the impact of social media on their organization and to learn how archiving social media can be accomplished, please join this webcast from Symantec.

Proactive Retention Means Effective Preservation in eDiscovery

Thursday, September 22nd, 2011

It is axiomatic that the law helps those who help themselves.  Perhaps nowhere is that truism more applicable than in the context of electronic discovery.  The organization that implements an effective information governance strategy – including developing reasonable data retention policies – will likely avoid court sanctions and reduce its legal costs.  This was confirmed in a recent industry survey, which found that organizations “help themselves” when they develop information retention policies.  According to the survey, better retention practices drive dramatically better outcomes in litigation, particularly in the context of retention and preservation.

Such a finding is echoed by a recent case issued from the District of Indiana.  In Haraburda v. Arcelor Mittal U.S.A., Inc. (D. Ind. June 28, 2011), the court tied a litigant’s preservation duty to its document retention efforts.  In order to discharge its duty to reactively preserve evidence, the court reasoned that enterprises must proactively create “a ‘comprehensive’ document retention policy that will ensure that relevant documents are retained.”  Failing to implement a retention policy often results in a loss of key information.  And this, opined the court, may result in sanctions.

Such a finding is not limited to an isolated case.  Court decisions from across the United States in 2011 have found the same connection; better data retention practices yield more successful document preservation results.  For example, in the E.I. du Pont de Nemours v. Kolon Industries (E.D. Va. April 27, 2011), the plaintiff manufacturer defeated a sanctions motion due to its effective information retention procedures.   The manufacturer implemented a document retention policy that typically kept emails from former employee accounts for 60 days, after which the emails were overwritten and deleted.   Among the emails deleted pursuant to that policy were several that the defendant argued were relevant to its counter-claims.  The DuPont court declined to impose sanctions, however, since the emails in question were overwritten before the duty to preserve was triggered.  Instead, the court lauded the manufacturer’s preservation efforts, finding that it “took positive steps reasonably calculated to ensure that information . . . was preserved for litigation.”  Because the manufacturer faithfully observed its established retention policy, it reduced a stockpile of email, made relevant documents unavailable for discovery and was still protected from court sanctions.

Similarly, in Viramontes v. U.S. Bancorp (N.D.Ill. Jan. 27, 2011), the defendant bank relied on its data retention protocols to stave off a sanctions motion after deleting several years of email.  Because those emails were destroyed pursuant to a neutral retention policy before a preservation duty attached, the bank was protected from sanctions under the Federal Rule of Civil Procedure 37(e) safe harbor for the destruction of electronic information.

The converse, of course, is also true.  Those organizations that failed to implement effective retention policies have fared poorly in discovery because they have not preserved relevant ESI.  Take the defendant, for instance, in Northington v. H & M International (N.D.Ill. Jan. 12, 2011).  The court issued an adverse inference jury instruction against that company because it spoliated significant emails and other data.  The genesis of this spoliation was the company’s failure to establish a formal document retention policy.  Instead of having a thoughtful, top-down approach, “data retention . . . was evidently handled on an ad hoc, case-by-case basis.”  The company’s failure to develop a pre-litigation information retention policy eventually led to the loss of key information and the court’s sanctions award.

These recent cases and others confirm the correlation between retention and preservation.  Simply put, proactive retention leads to better preservation in eDiscovery.  Anything less could be disastrous in litigation.

Email Isn’t eDiscovery Top Dog Any Longer, Recent Survey Finds

Sunday, September 18th, 2011

Symantec today issued the findings of its second annual Information Retention and eDiscovery Survey, which examined how enterprises are coping with the tsunami of electronically stored information (ESI) that we see expanding by the minute.  Perhaps counter intuitively, the survey of legal and IT personnel at 2,000 enterprises found that email is no longer the primary source of ESI companies produced in response to eDiscovery requests.  In fact, email came in third place (58%) to files/documents (67%) and database/application data (61%).  Marking a departure from the landscape as recently as a few years ago, the survey reveals that email does not axiomatically equal eDiscovery any longer.

Some may react incredulously to these results. For instance, noted eDiscovery expert Ralph Losey continues to stress the paramount importance of email: “In the world of employment litigation it is all about email and attachments and other informal communications. That is not to say databases aren’t also sometimes important. They can be, especially in class actions. But, the focus of eDiscovery remains squarely on email.”   While it’s hard to argue with Ralph, the real takeaway should be less about the relative descent of email’s importance, and more about the ascendency of other data types (including social media), which now have an unquestioned seat at the table.

The primary ramification is that organizations need to prepare for eDiscovery and governmental inquires by casting a wider ESI net, including social media, cloud data, instant messaging and structured data systems.  Forward-thinking companies should map out where all ESI resides company-wide so that these important sources do not go unrecognized.  Once these sources of potentially responsive ESI are accounted for, the right eDiscovery tools need to be deployed so that these disparate types of ESI can be defensibly collected and processed for review in a singular, efficient and auditable environment.

The survey also found that companies which employ best practices such as implementing information retention plans, automating the enforcement of legal holds and leveraging archiving tools instead of relying on backups, fare dramatically better when it comes to responding to eDiscovery requests. Companies in the survey with good information governance hygiene were:

  • 81% more likely to have a formal retention plan in place
  • 63% more likely to automate legal holds
  • 50% more likely to use a formal archiving tool

These top-tier companies in the survey were able to respond much faster and more successfully to an eDiscovery request, often suffering fewer negative consequences:

  • 78% less likely to be sanctioned
  • 47% less likely to lead to a compromised legal position
  • 45% less likely to disclose too much information

This last bullet (disclosing too much information) has a number of negative ramifications beyond just giving the opposition more ammo than is strictly necessary.  Since much of the eDiscovery process is volume-based, particularly the eyes-on review component, every extra gigabyte of produced information costs the organization in both seen and unseen ways.  Some have estimated that it costs between $3-5 a document for manual attorney review – and at 50,000 pages to a gigabyte, these data-related expenses can really add up quickly.

On the other side of the coin, there were those companies with bad information governance hygiene.  While this isn’t terribly surprising, it is shocking to see how many entities fail to connect the dots between information governance and risk reduction.  Despite the numerous risks, the survey found nearly half of the respondents did not have an information retention plan in place, and of this group, only 30% were discussing how to do so.  Most shockingly, 14% appear to be ostriches with their heads in the sand and have no plans to implement any retention plan whatsoever.  When asked why folks weren’t taking action, respondents indicated lack of need (41%), too costly (38%), nobody has been chartered with that responsibility (27%), don’t have time (26%) and lack of expertise (21%) as top reasons.  While I get the cost issue, particularly in these tough economic times, it’s bewildering to think that so many companies feel immune from the requirements of having even a basic retention plan.

As the saying goes, “You don’t need to be a weatherman to tell which way the wind blows.”  And, the winds of change are upon us.  Treating eDiscovery as a repeatable business process isn’t a Herculean task, but it is one that cannot be accomplished without good information governance hygiene and the profound recognition that email isn’t the only game in town.

For more information regarding good records management hygiene, check out this informative video blog and Contoural article.

Remembering the Past: Deploying Technology to Ensure eDiscovery Compliance

Tuesday, September 6th, 2011

A famous quote from intellectual George Santayana provides an appropriate backdrop for organizations to better understand why they should deploy technology to strengthen their litigation response effort.  As Santayana explained in The Life of Reason: Reason in Common Sense, “[t]hose who cannot remember the past are condemned to repeat it.”

The “past” can be a powerful playbook in the game of eDiscovery.  Fortunately for organizations, the lessons of eDiscovery history abound.  Indeed, the decisions that courts issue every day across the United States and in other countries provide substantial guidance on what organizations should and should not do to properly prepare for the discovery phase of litigation.

One of the principal lessons that can be gleaned from American court cases in 2011 is that technology can help organizations address the demands of eDiscovery in litigation.  Technology has assumed such a significant role because it facilitates the oversight process that lawyers must engage in to ensure that pertinent documents are preserved for discovery.  This year alone, the failure to exercise that oversight has in many instances culminated in evidence destruction and sanctions.

That message was emphasized this summer by a Virginia based federal court in a hotly contested trade secret dispute.  In E.I. du Pont de Nemours v. Kolon Industries (E.D. Va. July 21, 2011), the court determined that it would issue an adverse inference jury instruction against defendant Kolon Industries as a sanction for its evidence spoliation.  The spoliation at issue occurred when Kolon deleted emails and other records relevant to DuPont’s trade secret claims.  After being apprised of the lawsuit and then receiving multiple litigation hold notices, several Kolon executives and employees met together and identified emails and other documents that should be deleted.  The ensuing destruction was staggering.  Nearly 18,000 files and emails were deleted.  Furthermore, many of these materials went right to the heart of DuPont’s claim that key aspects of its Kevlar© formula were allegedly misappropriated to improve Kolon’s competing product line.

Surprisingly, however, the court did not finger the Kolon employees as the principal culprits for spoliation.  Instead, the court laid the blame on Kolon’s attorneys and executives, reasoning they could have prevented the destruction of information through better oversight.  The hold process was particularly flawed.  The notices were either too limited in their distribution, ineffective since they were prepared in English for Korean-speaking employees, or too late to prevent or otherwise alleviate the spoliation.  Given the logistical challenges of implementing a hold in this instance, perhaps only the automated functions of technology such as archiving software might have strengthened the oversight process and obviated the spoliation that took place.

The lack of attorney oversight also factored into another pertinent sanctions order this year, this time from a federal court in Chicago.  In Northington v. H & M International (N.D.Ill. Jan. 12, 2011), the court issued an adverse inference jury instruction against a company that destroyed relevant emails and other data.  The spoliation occurred in large part because the company neglected to establish a global litigation response effort.  For example, there was no process for issuing or ensuring compliance with a litigation hold.  Nor was counsel engaged in the critical steps of preservation, identification or collection of electronically stored information (ESI).  Into this vacuum stepped rank and file employees – some of whom were accused by the plaintiff of harassment – who were tasked with identifying and collecting discoverable emails from their workstations.  Predictably, key documents were never found and the court had little choice but to promise to inform the jury that the company destroyed evidence.

The problems associated with the lack of oversight in DuPont and Northington are compelling reasons why organizations should consider using technology tools as part of their overall litigation response strategy.  One of the most helpful tools in this regard is archiving software.  Indeed, having the right archiving solution in place might have preserved the spoliated records in these actions.

For example, archiving software can be programmed to prevent employees from deleting emails and other electronically stored information.  By ingesting data into a central repository and leaving copies of the materials on local computers, employees could have access to their archived records.  They would not, however, be able to delete those documents from the software archive.  In addition, a litigation hold could have been placed on archived data to prevent automated retention rules from overwriting information.  Either of these features might have prevented much of the spoliation – and the resulting sanctions – that occurred in both the DuPont and Northington cases.

The automated functions of archiving technology can benefit a company’s litigation response in other ways.  For example, such a tool may limit the amount of potentially relevant information available for follow-on litigation.  Absent a legal hold, retention rules that are programmed into the software will ensure that ESI is expired once it reaches the end of a designated period.  In DuPont, such a feature could arguably have eliminated entire categories of older documents before a duty to preserve those materials ever ripened.  This facet not only has the potential to reduce legal exposure, but also the attendant costs associated with reviewing those documents in litigation.

DuPont, Northington and other cases from the recent past delineate the steps companies can take to address the challenges of eDiscovery.  Organizations do not have to “repeat” past mistakes that victimized clients and counsel alike.  Instead, they can implement the right technology tools as part of a thoughtful, proactive approach to litigation.  By so doing, organizations will avoid Santayana’s judgment by “remembering” the lessons of eDiscovery history.

Jumping the Gun? Three Approaches to Drafting New Federal Discovery Rules

Thursday, September 1st, 2011

In my last post I announced that discussions are taking place that could change the way preservation and sanctions issues are handled within the federal court system.  The next round of discussions about possible amendments to the Federal Rules of Civil Procedure (FRCP) is scheduled to take place on September 9th in Dallas, Texas as part of a “mini-conference” led by the Discovery Subcommittee – a committee appointed by the Advisory Committee on Civil Rules.  This post discusses three different rule amendment approaches that attendees have been asked to consider in order to help them prepare for the mini-conference.  A complete list of attendees, preparation materials, and questions the group will consider are included in the Advisory Committee’s June 29, 2011 memorandum to the participants.

The debate about whether or not rule amendments are even required is far from over.  A 452-page document located on the U.S. Courts’ website chronicles many of the meetings, notes, and submissions driving the current discussion.  Page 265 of the document contains a memorandum prepared by the Civil Rules Advisory Committee earlier this year, stating that:

“the Subcommittee has reached no conclusion on whether rule amendments would be a productive way of dealing with preservation/sanctions concerns, much less what amendment proposals would be useful.”

Despite concerns that amending the current rules now would amount to jumping the gun, there is an undeniable desire for more clarity around when the duty to preserve electronically stored information (ESI) is triggered, what must be preserved, and when the duty expires.  This momentum has resulted in the crafting of draft proposals that are likely to help frame the discussion on September 9th. The “proposals” are really draft approaches that have been broken down into three general categories described in the Civil Rules Advisory Committee’s memorandum, titled: “PRESERVATION/SANCTIONS ISSUES” (see page 263).  The Category 1 approach can best be described as providing a higher degree of specificity than the other approaches.  For example, the Category 1 approach provides a fairly detailed explanation of the duty to preserve evidence (Rule 26.1(a)) and details possible triggers (26.1(b)), the scope of the duty to preserve (26.1(c)), and sanctions (Rule 37).  Category 2 proposes a more general preservation rule, while Category 3 only addresses sanctions as a tool for influencing behavior.  The three categories are discussed in more detail below.

Category 1: Specific Rule

This draft includes many different exemplary lists, alternative approaches, and footnotes that highlight the fact that one of the key challenges with drafting a specific rule is trying to foresee all of the challenges that might lie in the road ahead.  For example, the draft rule provides a long list of events that could trigger the duty to preserve evidence, including everything from serving a pleading to taking “any other action” in anticipation of litigation.   The rule also provides a list of information types that are “presumptively excluded” from the preservation duty, such as deleted data on hard drives, temporary internet files, and physically damaged media.

The lists are helpful in that they provide guidance.  However, each list also includes a “catch-all” provision to address scenarios that might not be foreseeable.  The inclusion of catch-all provisions highlights the inherent challenge of providing more clarity and certainty without creating rules that are so inflexible that they are difficult to apply to unforeseen factual scenarios or technological developments.  Some might argue that trying to provide a laundry list of examples will make passage of new rules difficult because each item on the list will stir debate.  Others contend that the lists add little value because the catch-all provisions will still require litigators to pass the sniff test of “reasonableness.”

Despite the inherent challenges related to drafting rules with specificity, most practitioners would likely support the inclusion of lists or examples that provide at least some direction.  What is likely to be far more controversial with respect to Category 1 is the use of alternative language proposing fixed limits around custodians and litigation holds.  For example, one alternative would limit data preservation requirements to a fixed number of custodians and the duty to preserve evidence would similarly expire after a fixed number of years.  Bright line rules like these may be easier to understand, but they also tend to be controversial since they lack the flexibility necessary to fairly address every conceivable situation.

Category 2: General Rule

Like the Category 1 proposal, the Category 2 proposal uses lists and outlines several alternative approaches throughout the rule.  However, the Category 2 proposal fundamentally differs from Category 1 by outlining a more general approach.  For example, one of the alternatives essentially states that the duty to preserve evidence is triggered whenever a “reasonable person” would expect to be a party to an action.  Similarly, the ongoing duty to preserve information after the duty has been triggered would be evaluated based on what is described as a “reasonable period” under the circumstances.

The beauty of this more general approach lies in its simplicity and flexibility.  The idea is that evaluating conduct based on the “reasonableness” of a person’s actions is much easier than attempting to draft bright line legal guidelines that account for every possible factual scenario.  The flip side is that reasonable minds could differ and results could be inconsistent if there are no bright line rules.  What this means in the context of the federal rule discussion is that one judge might find a party’s conduct with respect to data preservation efforts reasonable, while another judge might issue sanctions based on the same set of facts.  In large part, it is this lack of certainty and guidance in the current rules that sparked the current debate in the first place.

Category 3: Sanctions-Based Rule

Unlike the first two categories, the Category 3 approach focuses only on sanctions and would act like more of a “back-end” rule.  In other words, the rule would not contain any specific directives about preservation, but it would provide direction in the areas of when and how sanctions might be applied.

Despite the draconian image a “sanctions” based rule might conjure up, the Category 3 rule may seem surprisingly lenient to some.  For example, absent extraordinary circumstances, the court would be prohibited from imposing any of the sanctions listed in Rule 37(b)(2) or from giving an adverse-inference instruction unless:

“the party’s failure to preserve discoverable information was willful or in bad faith and caused [substantial] prejudice in the litigation.”

The sanctions based approach would almost certainly have an impact on how parties handle upstream preservation related issues.  However, the key ingredients that will impact what kind of behavior this rule drives are the severity of the threatened sanction as well as the applicable standard.  For example, a party facing severe sanctions for conduct that is either negligent, willful or in bad faith is likely to take their preservation obligations seriously.  On the other hand, if the realm of possible sanctions is trivial, parties are less likely to take their preservation related obligations seriously.

Conclusion

The three rule approaches represent very early attempts at framing possible approaches to amending the FRCP.  If the Discovery Subcommittee chooses to recommend rule amendments following the September 9th mini-conference in Dallas, the proposed language is likely to be closer to final form and easier to assess than the current proposals.  I will continue to monitor the rule making discussion and provide commentary in future posts.  Stay tuned for my next post where former US Magistrate Judge Ron Hedges explains why he thinks the rule changes are unnecessary and why the current proposals might run afoul of the Rules Enabling Act.

Clearwell Is Now Officially Part of Symantec

Monday, July 11th, 2011

Today, I am delighted to report that Clearwell Systems has become part of Symantec. We have, of course, been working closely together since obtaining regulatory approval for the acquisition last month, but this makes it official: Symantec can now offer customers Clearwell’s market-leading eDiscovery platform as well as its market-leading Symantec Enterprise Vault archiving solution. We are excited to be part of the Symantec team, and to work alongside so many talented people to create the next generation of eDiscovery and information governance solutions.

There are already a large number of joint customers using the Clearwell and Symantec solutions as part of an integrated eDiscovery and archiving workflow, and we are well underway towards building more robust integration between Clearwell and Symantec Enterprise Vault. In updating our product roadmaps, all our decisions are guided by feedback from customers who have told us over and over again that they want to:

  • Reduce costs across all phases represented in the Electronic Discovery Reference Model, from information management through review and production
  • Reduce risk by improving the defensibility and repeatability of their archiving and eDiscovery processes
  • Streamline their end to end archiving and eDiscovery lifecycle to meet legal and regulatory deadlines
  • Start managing information and conducting eDiscovery in as little as one day; whether on-premise, as a hosted solution or in the cloud
  • Meet their enterprise-wide archiving and eDiscovery needs, whether they have less than 25 to more than one million users

As we’ve discussed before, our plan as part of Symantec is to deliver a seamless, integrated archiving and eDiscovery management workflow that benefits all our customers. To keep everyone in the loop, we will continue to post updates and answer questions on the integrated product portfolio here and on the Symantec eDiscovery blog.

For more on the acquisition, and the response from our customers, partners and the industry at large, visit: http://www.symantec.com/clearwell.

Apple, Code Name K48 and E-Discovery

Wednesday, June 22nd, 2011

According to a complaint filed by the U.S. government, the FBI secretly recorded an employee at one of Apple’s suppliers passing confidential information about the soon to be released Apple iPad in an October, 2009 telephone conversation.  The recording, along with other evidence, led to the arrest of the employee and others on charges on of wire fraud and conspiracy to commit securities fraud on December 16, 2010 as part of a major insider-trading investigation.  In the conversation, a director for Flextronics named Walter Shimoon is heard saying:

“they [Apple] have a code name for something new … It’s … It’s totally … It’s a new category altogether… It doesn’t have a camera, what I figured out. So I speculated that it’s probably a reader. … Something like that. Um, let me tell you, it’s a very secretive program … It’s called K, K48. That’s the internal name. So, you can get, at Apple you can get fired for saying K48.”

Four months later, the first Apple iPad, code named K48, was unveiled to the public.    To read more about the case background, read the press release issued by the U.S. Attorneys’ Office on December 16, 2010.

The case is interesting from an eDiscovery standpoint because it highlights challenges related to finding critical evidence as part of an investigation or lawsuit when people are intentionally using code words to hide information.  Finding or overlooking important documents that have been disguised can make or break your case, so determining whether or not key players are using code words is an important part of a thorough investigation.  Equally important to the investigation is segregating relevant and irrelevant documents quickly before key evidence is lost or destroyed without being required to conduct a painstaking page by page review of each document.

How Does Technology Help?

The good news is that even though technology innovation has resulted in massive data growth requiring the review and analysis of more documentary evidence during lawsuits and investigations, advances in eDiscovery technology have also made sifting through this information faster and easier.  In other words, technology can help solve the data growth problem technology created.

One of the newest advances is the use of “transparent concept search” technology to find important electronic files in lieu of basic “keyword” or “traditional” concept searching technology.  In many situations investigators or lawyers simply aren’t aware code words are being used to hide activity, so critical evidence is often overlooked.  For example, in the present case assume the investigator is unaware that “K48” is the internal code name used for the first iPad.  A simple keyword search for the term “iPad” may not retrieve critical documents about the “iPad” because the code name K48 is being used to disguise the product name.  If this is the only search methodology used, information could easily be overlooked during the investigation due to the limitations of simple keyword search technology.

On the other hand, running the same search using a traditional concept searching tool is likely to retrieve documents containing the word “iPad” as well as other conceptually related documents.  The problem is that the user has no ability to control the breadth of the search using traditional concept searching technology.  That means even though a traditional concept search for the term “iPad” is likely to include documents containing the term “K48” and “iPad,” it is also likely to retrieve a large number of irrelevant documents containing terms like “iPod, iTouch and iTunes that may appear to be conceptually related to the search term “iPad.”  The problem may seem trivial initially, but when investigators are required to read hundreds or thousands of irrelevant documents about the iPod, iTouch or iTunes in an effort to find relevant documents about the iPad, the time and cost of the investigation can skyrocket.

Next Generation Transparent Concept Search Technology

To solve this problem, next generation transparent concept search technology takes traditional concept searching a step further by empowering investigators to reap the advantages of traditional concept searching while actually reducing instead of increasing e-discovery expenses.  The secret is that transparent concept searching technology significantly reduces the time and expense resulting from over-inclusive document retrieval by allowing users to eliminate documents containing concepts that are not relevant to the intended search.  This is accomplished by providing a transparent view of concepts related to a search so that users can actually visualize and select (or deselect) the range of concepts to be included in a search before the search is executed.

For example, using transparent concept search technology to search for the term “iPad” would reveal conceptually related terms like “K48” just like traditional concept searching.  However, a transparent concept search would also provide a list of all concepts related to the keyword “iPad” prior to the search such as “K48, iPod, iTouch, Shimoon, iTunes, etc.  Prior to executing the search, the user could de-select irrelevant concepts and limit the search to “iPad”, “Shimoon”, “internal” and “K48” to make sure only the most relevant documents are retrieved. (See Figure 1).  In addition to decreasing the cost associated with segregating relevant and irrelevant documents, the transparent approach to concept searching results in strategic advantages for investigators and legal teams because the most relevant evidence is found quickly so cases can be assessed faster, with more accuracy, and before evidence disappears.

Figure 1: Transparent concept search reveals all concepts related to the keyword “iPad” so users can not only identify key documents they may have otherwise overlooked, but they can also select which concepts (“internal” “K48” “Shimoon”) to include in the search so only the most relevant documents are retrieved.

Conclusion

Not knowing what to search for as part of eDiscovery or investigations is often the biggest organizational challenge that basic keyword and traditional concept search technology has not been able to solve.  Next generation transparent concept search technology overcomes the inherent limitations of basic keyword and traditional concept searching technology by empowering users to uncover, assess, and review evidence faster and with more accuracy, thereby giving litigators or investigators new strategic advantages on every case.

E-Discovery Goes Mainstream

Tuesday, June 21st, 2011

These days, being mentioned on a late-night talk show is pretty much a stamp of “going mainstream”. This is true of celebrities (notably the One-Man Band that is Charlie Sheen), public figures (Captain “Sully” Sullenberger, who piloted the US Airways plane to a safe landing on the Hudson River), and even infomercial goods (who isn’t familiar by now with the Snuggie?)

In the e-discovery world, we realized just how mainstream this industry is becoming when we made mention on The Daily Show with Jon Stewart. With guest star Fareed Zakaria, fresh off the release of his new book, on set to discuss the American economy and the impact of technology on corporations, audiences were treated to this nugget:

Zakaria:   Machines can do things that people used to. There’s now computer programs that can do stuff that lawyers used to be able to do – discovery and things like that. May not be such a bad thing…

Stewart:   What can lawyers do that computers can’t do?

Lawyer jokes are never in short supply, and leave it to Jon Stewart not to miss a timely jab when one can be thrown. But we took notice because, of all the examples Zakaria could have used for technology’s impact on businesses everywhere — he chose to highlight the role of e-discovery software.

This was far from the first “mainstream” move for the e-discovery industry. In March, The New York Times published a featured – and top-emailed – article on advances in electronic discovery software. In May, leading analyst firm Gartner published the Magic Quadrant for E-Discovery Software, its first Magic Quadrant on the electronic discovery industry. And then in June, there it was: electronic discovery, right alongside CNN’s Fareed Zakaria and all Jon Stewart’s comedic antics on The Daily Show. Taken together, it’s clear that e-discovery is a hot topic on the minds of business folks and, increasingly, mainstream audiences. We’re eager to see where it comes up next – and secretly hoping the SNL sketch team is taking note.

Patents and Innovation in Electronic Discovery

Monday, June 13th, 2011

In the world of technology we live in, a huge amount of benefit is created when people apply certain well-known techniques to solve problems and create value to the broader community. Such techniques are often the result of painstakingly long and laborious research, driven primarily by academic institutions with private industry either funding such research directly or by co-opting them in their own work. When the industry as a whole recognizes a certain methodology, it gains popular usage.

In information retrieval, searching and retrieving relevant content from unstructured text has been a vexing problem, and we’ve had decades of the brightest minds applying their collective intelligence and the rigors of peer review to validate and establish the most effective way to solve a retrieval problem. And, research forums such as TREC, SIGIR and other information retrieval conferences establish a venue for advancing the state of the art. So, when Recommind announced that they have been issued a patent on Predictive Coding, I took notice, especially since it touches a nerve with those who believe research should be openly shared.

The patent lists six claims that describe a workflow whereby humans review and code a document and the coding decisions applied to the document sample are projected or applied to the larger collection of documents. Anyone who has even the slightest exposure to information retrieval research will recognize this as a very common interactive relevance feedback mechanism. Relevance feedback as a way to perform information retrieval has been studied for well over forty years, with a paper as early as 1968 by Rocchio J.J., titled Relevance Feedback in Information Retrieval. It falls under a category of methods broadly known as machine learning.

Any supervised machine learning system involves creating a training sample and using that sample to project into a larger population. The fact that one could claim patentable ideas on something that is so widely known and used is puzzling.  Any workflow that employs machine learning would include the steps of creating an initial control set, coding that by human review, and applying the learned tags to a larger population.  In fact, the Wiki article Learning to rank describes precisely the workflow that is claimed in the patent and as part of our participation in the TREC Legal Track 2009, Clearwell submitted a paper with iterative sampling based evaluation and automatic expansion of initial query.  In that paper, we describe exactly the workflow postulated by the six claims of the patent.

In terms of other prior art that would potentially invalidate the patent, the list is long. Let’s start with Text Classification. Text Classification using Support Vector Machines (SVM) was first published by Thorsten Joachims in 1998, in the Proceedings of Sixteenth International Conference on Machine Learning, as well as his book Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms, published by The Springer International Series in Engineering and Computer Science.  Now a well-recognized Professor of Computer Science at Cornell University, that work is widely cited as a seminal work on the area of machine learning and text classification. Interestingly, this work was cited by the Patent Examiner as prior art, but the inventors missed listing it. Nevertheless, that work and further work by several academics such as Leopold and Kindermann has already established the use of Support Vector Machines as a useful technique for machine learning. To claim the novelty of its use in automatically coding documents is, in my opinion, a hollow claim.

Another technology mentioned in passing is Latent Semantic Indexing (LSI). This is proposed as a retrieval technique by Deerwester, S., Dumais, S.T., Furnas, G.W.,Landauer, T.K., Harshman R. in their paper, Indexing by Latent Semantic Analysis, in Journal of the ASIS, 41(6):391-407, 1990. The use of LSI for semantic analysis, concept searching and text classification is also very widespread, and once again, it seems ridiculous to claim that it is something novel or innovative.

Next, let’s examine the use of sampling to validate the initial control set. Use of sampling for validation of a control set of documents is in fact such a widely known technique that most e-discovery productions employ sampling. In fact, the Sedona Commentary on Achieving Quality and the EDRM Search Guide recommend use of sampling to validate automated searches. Furthermore, several E-discovery opinions such as Judge Grimm’s opinion in Victor Stanley [Victor Stanley, Inc. v. Creative Pipe, Inc. , 2008 WL 2221841 (D. Md., May 29, 2008)]  suggests that any technique that reduces the universe of documents produced must employ sampling to validate automated searches.

In short, we think the claims issued in the patent and the associated workflow are so commonly used that the workflow is neither novel nor non-obvious to a trained practitioner, and there is enough prior art on each of the individual technologies to warrant a re-examination and eventual invalidation of the patent. In any event, it is fairly easy for anyone to pick up existing prior art and devise a similar workflow that achieves the same or better outcome, and attempt to enforce the patent will likely be challenged.

But there is an even bigger issue at stake here beyond the status of Recommind’s patent: namely, shouldn’t the e-discovery vendor community continue to work, as it has for years, toward what is in the best interest of the legal community and, more broadly, the justice system? Recommind’s thinly veiled threats about requiring industry participants to license their technology are an affront to those who have invested years developing the technology and practicing the approach in real-world e-discovery cases. Spend a few minutes trolling (no pun intended) around on archive.org and you’ll see that early predictive coding companies like H5 were practicing machine learning and predictive workflows in e-discovery over two years before Recommind announced their first version of Axcelerate.

Wouldn’t a better outcome be for corporations and law firms to benefit from the innovation that comes from free competition in the marketplace, while still honoring the sort of novel, non-obvious innovation that warrants patent protection? Legitimate patents that actually encourage and protect investments by an organization are fine, but process patents that attempt to patent a workflow are bad for business. With such an approach, the full promise of automated document review (which, as any truly honest vendor should admit, still has much more room to grow and develop) can be fully realized in a way that both provides vendors with the fair and just economic rewards they deserve while helping the legal system become radically more efficient.

Gartner Publishes First Magic Quadrant for E-Discovery Software

Friday, June 10th, 2011

Last month, Gartner published the 2011 Magic Quadrant for E-Discovery Software, its first ever Magic Quadrant (MQ) on the electronic discovery industry.

We believe the Gartner MQ signals e-discovery’s arrival as a major category of enterprise software, and creates a single, definitive “buyers’ guide” to help companies choose between the various solutions.  As the report points out, “The reason e-discovery is now a pressing issue for most companies is clear: ESI in all its many forms dominates legal proceedings because modern business is mostly conducted using electronic communications and electronic records. Regulators require this ESI to be archived for proof of compliance.”[1]

The authors of the report, Debra Logan and John Bace, are two of the industry’s leading lights. The report reflects their deep understanding of the domain and includes several keen insights into emerging trends and market dynamics.

Most software buyers are familiar with Gartner Magic Quadrants and the rigorous methodology behind them. In order to be included in the MQ, vendors must meet quantitative requirements in market penetration and customer base and are then evaluated upon certain criteria for completeness of vision and ability to execute. In the Magic Quadrant for E-Discovery Software, Gartner states that, “Ease of use, intuitive user interfaces, attorney-focused workflow, advanced but transparent semantic analysis features, native file format review, and foreign language support are all considered desirable features from the end user’s point of view.”[2] According to the report, “A vendor’s ability and willingness to perform proofs of concept (POCs) is also important, and many references told us that, with certain vendors, “try before you buy” arrangements or POCs were so successful that they did not even open their tendering process to competitive bidding.”[3]

In total, the Gartner Magic Quadrant for E-Discovery Software report analyzes 24 different e-discovery software vendors, and is meant to help CIOs, general counsel, IT professionals, lawyers, compliance staff and legal service providersunderstand the dynamics and landscape of the e-discovery software market. Combined with its analysis of the factors driving the growth of e-discovery and its vendor-by-vendor evaluation, we believe this makes the report a must-read for anyone involved in selecting an e-discovery solution.

For a limited time, please register here to download a complimentary copy of the Gartner Magic Quadrant for E-Discovery Software.

About the Magic Quadrant
The Magic Quadrant is copyrighted 2011 by Gartner, Inc. and is reused with permission. The Magic Quadrant is a graphical representation of a marketplace at and for a specific time period. It depicts Gartner’s analysis of how certain vendors measure against criteria for that marketplace, as defined by Gartner. Gartner does not endorse any vendor, product or service depicted in the Magic Quadrant, and does not advise technology users to select only those vendors placed in the “Leaders” quadrant. The Magic Quadrant is intended solely as a research tool, and is not meant to be a specific guide to action. Gartner disclaims all warranties, express or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.


[1] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 5.

[2] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 8.

[3] Gartner, Inc. “Magic Quadrant for E-Discovery Software”, by Debra Logan, John Bace, May 13, 2011, page 9.