Archive for the ‘analysis’ Category

Data Classification and Data Loss Prevention: Indispensable Building Blocks of Information Governance

Thursday, March 15th, 2012

In an effort to envision information governance as a modular and digestible concept, a great place to start is by imagining two building blocks. Not only will this approach make the task of thinking about holistic information governance less daunting, but it will carve out a beginning and an end with two basic concepts, thereby enabling a realistic and modular implementation.

Classification, Intelligent Archiving and Storage

The first block, and one of the single biggest cost savers an organization can embrace, is the proactive classification of data. Data classification begins with policy creation. Organizations that form a committee(s) to define policies and invest the energy into the enforcement of those policies almost always reap significant benefits from the initiative.  The efficiencies are so compelling that it’s a wonder that data classification and archiving are ever considered separately. One major benefit includes the ability to intelligently leverage information since the classification places the data with similar material pursuant to the stated policy. Organizations that embrace archiving for storage footprint reduction, compliance, litigation, and retention will also see the value of preventing trash from entering the archive upfront.

The more useless data that can be disposed of at the initial point of classification, the more intelligently and nimbly the archive can run, thereby reducing costs when it comes time to collect and cull potentially non-relevant data for eDiscovery. At a minimum, policies should be created to prevent trash from entering the archive.  Optimally, policies should contain key identifiers that direct information into specific folders within the archive.

One common concern among record managers is that data classification needs to be perfect – but perfection is  neither the goal nor is it achievable. For most organizations, any improvement in data management would be a big step in the right direction. Proactive data classification and archiving are not meant to be granular records management systems.  Instead they serve as safeguards on what enters the archiving system, and where and for how long that data is subsequently maintained.

Data Loss Prevention, Asset Protection and Security

The other beneficial block of a holistic information governance plan is security-centric and focused on data loss prevention (DLP). With the proactive management of data, it is important to reduce costs as information is created and received.  Similarly, it is critical to monitor sensitive data on an outgoing basis to protect organizations from inadvertent disclosures of sensitive information and intellectual property assets. Much like the policy-driven classification, data loss prevention requires policy creation as well. The policy creation requirements for DLP can luckily leverage much of the hard work done with document retention and classification as they often mirror each other.

If an organization does not know which data is sensitive or constitutes an asset, how can it be protected? In order for organizations to address their valuable information they need to assess, at a minimum, the following four considerations:

  1. What kind of information does the organization consider to be valuable/sensitive?
  2. What happens if that information gets into the wrong hands?
  3. Where does the sensitive information presently reside/where should it reside?
  4. How to track such information if it is transmitted outside of the organization?

The primary events that keep information security officers concerned regarding data loss prevention are: the unauthorized disclosure of sensitive customer information, unauthorized downloads of intellectual property, lost/stolen laptops, the transfer of proprietary information onto flash drives, and finally, concern over outbound emails containing sensitive information. These events most frequently occur at the hands of malicious and/or careless workers. A way to monitor and control activities associated with breach is through data loss prevention policy and technology.

Next Steps

Examine the document retention/classification policies and data loss prevention policies of the organizations and compare them for similarities.  Next, consider getting the key stakeholders for Compliance, IT, Legal, RIM, and Security together to talk about these aforementioned scenarios and to construct a policy. Make the agenda for the meeting short and simple, focusing first on email. Initially focus on how to address the trash being kept so it does not enter the archived environment in the first place. If you do not have an archive, consider getting one.

Finally, tie in data loss prevention as a necessary means of protecting the assets of the organization, as well as providing consistency through classification and data protection. The parameters for defining valuable information will be the same whether looking at classification or data loss prevention. If nothing else, addressing these two critical building blocks will reduce storage and eDiscovery costs, facilitating better coordination of information through intelligent archiving, while simultaneously protecting the organization’s critical assets.  

Big Data Decisions Ahead: Government-Sponsored Town Hall Meeting for eDiscovery Industry Coincides With Federal Agency Deadline

Wednesday, February 29th, 2012

Update For Report Submission By Agencies

We are fast approaching the March 27, 2012 deadline for federal agencies to submit their reports to the Office of Management and Budget and the National Archives and Records Administration (NARA) to comply with the Presidential Mandate on records management. We are only at the inception, as we look to a very exciting public town hall meeting in Washington, D.C. – also scheduled for March 27, 2012. This meeting is primarily focused on gathering input from the public sector community, the vendor/IT community, and members of the public at large. Ultimately, NARA will issue a directive that will outline a centralized approach for the federal government for managing records and eDiscovery.

Agencies have been tight lipped about how far along they are in the process of evaluating their workflows and tools for managing their information (both electronic and paper). There is, however, some empirical data from an InformationWeek Survey conducted last year that takes the temperature on where the top IT professionals within the government have their sights set, and the Presidential Mandate should bring some of these concerns to the forefront of the reports. For example, the #1 business driver for migrating to the cloud – cited by 62% of respondents – was cost, while 77% of respondents said their biggest concern was security. Nonetheless, 46% were still highly likely to migrate to a private cloud.

Additionally, as part of the Federal Data Center Consolidation Initiative, agencies are looking to eliminate 800 data centers. While the cost savings are clear, from an information governance viewpoint, it’s hard not to ask what the government plans to do with all of those records?  Clearly, this shift, should it happen, will force the government into a more service-based management approach, as opposed to the traditional asset-based management approach. Some agencies have already migrated to the cloud. This is squarely in line with the Opex over Capex approach emerging for efficiency and cost savings.

Political Climate Unknown

Another major concern that will affect any decisions or policy implementation within the government is, not surprisingly, politics. Luckily, regardless of political party affiliation, it seems to be broadly agreed that the combination of IT spend in Washington, D.C. and the government’s slow move to properly manage electronic records is a problem. Two of the many examples of the problem are manifested in the inability to issue effective litigation holds or respond to Freedom of Information Act (FOIA) requests in a timely and complete manner. Even still, the political agenda of the Republican party may affect the prioritization of the Democratic President’s mandate and efforts could be derailed with a potential change in administration.

Given the election year and the heavy analysis required to produce the report, there is a sentiment in Washington that all of this work may be for naught if the appropriate resources cannot be secured then allocated to effectuate the recommendations. The reality is that data is growing at an unprecedented rate, and the need for the intelligent management of information is no longer deniable. The long term effects of putting this overhaul on the back burner could be disastrous. The government needs a modular plan and a solid budget to address the problem now, as they are already behind.

VanRoekel’s Information Governance

One issue that will likely not be agreed upon between Democrats and Republicans to accomplish the mandate is the almighty budget, and the technology the government must purchase in order to accomplish the necessary technological changes are going to cost a pretty penny.  Steven VanRoekel, the Federal CIO, stated upon the release of the FY 2013 $78.8 billion dollar IT budget:

“We are also making cyber security a cross-agency, cross-government priority goal this year. We have done a good job in ramping up on cyber capabilities agency-by-agency, and as we come together around this goal, we will hold the whole of government accountable for cyber capabilities and examine threats in a holistic way.”

His quote indicates the priority from the top down of evaluating IT holistically, which dovetails nicely with the presidential mandate since security and records management are only two parts of the entire information governance picture. Each agency still has their own work cut out for them across the EDRM. One of the most pressing issues in the upcoming reports will be what each agency decides to bring in-house or to continue outsourcing. This decision will in part depend on whether the inefficiencies identified lead agencies to conclude that they can perform those functions for less money and more efficiently than their contractors.  In evaluating their present capabilities, each agency will need to look at what workflows and technologies they currently have deployed across divisions, what they presently outsource,  and what the marketplace potentially offers them today to address their challenges.

The reason this question is central is because it begs an all-important question about information governance itself.  Information governance inherently implies that an organization or government control most or all aspects of the EDRM model in order to derive the benefits of security, storage, records management and eDiscovery capabilities. Presently, the government is outsourcing many of their litigation services to third party companies that have essentially become de facto government agencies.  This is partly due to scalability issues, and partly because the resources and technologies that are deployed in-house within these agencies are inadequate to properly execute a robust information governance plan.

Conclusion

The ideal scenario for each government agency to comply with the mandate would be that they deploy automated classification for their records management, archiving with expiration appropriately implemented for more than just email, and finally, some level of eDiscovery capability in order to conduct early case assessment and easily produce data for FOIA.  The level of early case assessment needed by each agency will vary, but the general idea would be that before contacting a third party to conduct data collection, the scope of an investigation or matter would be able to be determined in-house.  All things considered, the question remains if the Obama administration will foot this bill or if we will have to wait for a bigger price tag later down the road.  Either way, the government will have to come up to speed and make these changes eventually and the town hall meeting should be an accurate thermometer on where the government stands.

Computer-Assisted Review “Acceptable in Appropriate Cases,” says Judge Peck in new Da Silva Moore eDiscovery Ruling

Saturday, February 25th, 2012

The Honorable Andrew J. Peck, United States Magistrate Judge for the Southern District of New York, issued an opinion and order (order) on February 24th in Da Silva Moore v. Publicis Groupe, stating that computer-assisted review in eDiscovery is “acceptable in appropriate cases.”  The order was issued over plaintiffs’ objection that the predictive coding protocol submitted to the court will not provide an appropriate level of transparency into the predictive coding process.  This and other objections will be reviewed by the district court for error, leaving open the possibility that the order could be modified or overturned.  Regardless of whether or not that happens, Judge Peck’s order makes it clear that the future of predictive coding technology is bright, the role of other eDiscovery technology tools should not be overlooked, and the methodology for using any technology tool is just as important as the tool used.

Plaintiffs’ Objections and Judge Peck’s Preemptive Strikes

In anticipation of the district court’s review, the order preemptively rejects plaintiffs’ assertion that defendant MSL’s protocol is not sufficiently transparent.  In so doing, Judge Peck reasons that plaintiffs will be able to see how MSL codes emails.  If they disagree with MSL’s decisions, plaintiffs will be able to seek judicial intervention. (Id. at 16.)  Plaintiffs appear to argue that although this and other steps in the predictive coding protocol are transparent, the overall protocol (viewed in its entirety) is not transparent or fair.  The crux of plaintiffs’ argument is that just because MSL provides a few peeks behind the curtain during this complex process, many important decisions impacting the accuracy and quality of the document production are being made unilaterally by MSL.  Plaintiffs essentially conclude that such unilateral decision-making does not allow them to properly vet MSL’s methodology, which leads to a fox guarding the hen house problem.

Similarly, Judge Peck dismissed plaintiffs’ argument that expert testimony should have been considered during the status conference pursuant to Rule 702 and the Daubert standard.  In one of many references to his article, “Search, Forward: will manual document review and keyword searches be replaced by computer-assisted coding?” Judge Peck explains:

My article further explained my belief that Daubert would not apply to the results of using predictive coding, but that in any challenge to its use, this Judge would be interested in both the process used and the results.” (Id. at 4.)

The court further hints that results may play a bigger role than science:

“[I]f the use of predictive coding is challenged in a case before me, I will want to know what was done and why that produced defensible results. I may be less interested in the science behind the “black box” of the vendor’s software than in whether it produced responsive documents with reasonably high recall and high precision.” (Id.)

Judge Peck concludes that Rule 702 and Daubert are not applicable to how documents are searched for and found in discovery.  Instead, both deal with the” trial court’s role as gatekeeper to exclude unreliable testimony from being submitted to the jury at trial.” (Id. at 15.)  Despite Judge Peck’s comments, the waters are still murky on this point as evidenced by differing views expressed by Judges Grimm and Facciola in O’Keefe, Equity Analytics, and Victor Stanley.  For example, in Equity Analytics, Judge Facciola addresses the need for expert testimony to support keyword search technology:

[D]etermining whether a particular search methodology, such as keywords, will or will not be effective certainly requires knowledge beyond the ken of a lay person (and a lay lawyer) and requires expert testimony that meets the requirements of Rule 702 of the Federal Rules of Evidence.” (Id. at 333.)

Given the uncertainty regarding the applicability of Rule 702 and Daubert, it will be interesting to see if and how the district court addresses the issue of expert testimony.

What This Order Means and Does not Mean for the Future of Predictive Coding

The order states that “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.” (Id. at 2.)  Recognizing that there have been some erroneous reports, Judge Peck went to great lengths to clarify his order and to “correct the many blogs about this case.” (Id. at 2, fn. 1.)  Some important excerpts are listed below:

The Court did not order the use of predictive coding

“[T]he Court did not order the parties to use predictive coding.  The parties had agreed to defendants’ use of it, but had disputes over the scope and implementation, which the Court ruled on, thus accepting the use of computer-assisted review in this lawsuit.” (Id.)

Computer-assisted review is not required in all cases

“That does not mean computer-assisted review must be used in all cases, or that the exact ESI protocol approved here will be appropriate in all future cases that utilize computer-assisted review. (Id. at 25.)

The opinion should not be considered an endorsement of any particular vendors or tools

“Nor does this Opinion endorse any vendor…, nor any particular computer-assisted review tool.” (Id.)

Predictive coding technology can still be expensive

MSL wanted to only review and produce the top 40,000 documents, which it estimated would cost $200,000 (at $5 per document). (1/4/12 Conf. Tr. at 47-48, 51.)

Process and methodology are as important as the technology utilized

“As with keywords or any other technological solution to eDiscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b )(2)(C) proportionality.” (Id.)

Conclusion

The final excerpt drives home the points made in a recent Forbes article involving this and another predictive coding case (Kleen Products).  The first point is that there are a range of technology-assisted review (TAR) tools in the litigator’s tool belt that will often be used together in eDiscovery, and predictive coding technology is one of those tools.  Secondly, none of these tools will provide accurate results unless they are relatively easy to use and used properly.  In other words, the carpenter is just as important as the hammer.  Applying these guideposts and demanding cooperation and transparency between the parties will help the bench usher in a new era of eDiscovery technology that is fair and just for everyone.

Plaintiffs Object to Predictive Coding Order, Argue Lack of Transparency in eDiscovery Process

Friday, February 24th, 2012

The other shoe dropped in the Da Silva Moore v. Publicis Groupe case this week as the plaintiffs filed their objections to a preliminary eDiscovery order addressing predictive coding technology. In challenging the order issued by the Honorable Andrew J. Peck, the plaintiffs argue that the protocol will not provide an appropriate level of transparency into the predictive coding process. In particular, the plaintiffs assert that the ordered process does not establish “the necessary standards” and “quality assurance” levels required to satisfy Federal Rule of Civil Procedure 26(b)(1) and Federal Rule  of Evidence 702.

The Rule 26(b) Relevance Standard

With respect to the relevance standard under Rule 26, plaintiffs maintain that there are no objective criteria to establish that defendant’s predictive coding technology will reliably “capture a sufficient number of relevant documents from the total universe of documents in existence.” Unless the technology’s “search methodologies” are “carefully crafted and tested for quality assurance,” there is risk that the defined protocol could “exclude a large number of responsive email” from the defendant’s production. This, plaintiffs assert, is not acceptable in an employment discrimination matter where liberal discovery is typically the order of the day.

Reliability under Rule 702

The plaintiffs also contend that the court abdicated its gatekeeper role under Rule 702 and the U.S. Supreme Court’s decision in Daubert v. Merrell Dow Pharmaceuticals by not soliciting expert testimony to assess the reliability of the defendant’s predictive coding technology. Such testimony is particularly necessary in this instance, plaintiffs argue, where the technology at issue is new and untested by the judiciary. To support their position, the plaintiffs filed a declaration from their expert witness that challenges its reliability. Relying on that declaration, the plaintiffs complain that the process lacks “explicit and defined standards.” According to the plaintiffs, such standards would typically include “calculations . . . to determine whether the system is accurate in identifying responsive documents.” They would also include “the standard of acceptance that they are trying to achieve,” i.e., whether the defendant’s “method actually works.”  Plaintiffs conclude that without such “quality assurance measurements in place to determine whether the methodology is reliable,” the current predictive coding process is “fundamentally flawed” and should be rejected.

Wait and See

Now that the plaintiffs have filed their objections, the eDiscovery world must now wait and see what will happen next. The defendant will certainly respond in kind, vigorously defending the ordered process with declarations from its own experts. Whether the plaintiffs or the defendant will carry the day depends on how the district court views these issues, particularly the issue of transparency. Simply put, the question is whether the process at issue is sufficiently transparent to satisfy Rule 26 and Rule 702? That is the proverbial $64,000 question as we wait and see how this issue plays out in the courts over the coming weeks and months.

Judge Peck Issues Order Addressing “Joint Predictive Coding Protocol” in Da Silva Moore eDiscovery Case

Thursday, February 23rd, 2012

Litigation attorneys were abuzz last week when a few breaking news stories erroneously reported that The Honorable Andrew J. Peck, United States Magistrate Judge for the Southern District of New York, ordered the parties in a gender discrimination case to use predictive coding technology during discovery.  Despite early reports, the parties in the case (Da Silva Moore v. Publicis Group, et. al.) actually agreed to use predictive coding technology during discovery – apparently of their own accord.  The case is still significant because predictive coding technology in eDiscovery is relatively new to the legal field, and many have been reluctant to embrace a new technological approach to document review due to, among other things, a lack of judicial guidance.

Unfortunately, despite this atmosphere of cooperation, the discussion stalled when the parties realized they were miles apart in terms of defining a mutually agreeable predictive coding protocol.  A February status conference transcript reveals significant confusion and complexity related to issues such as random sampling, quality control testing, and the overall process integrity.  In response, Judge Peck ordered the parties to submit a Joint Protocol for eDiscovery to address eDiscovery generally and the use of predictive coding technology specifically.

The parties submitted their proposed protocol on February 22, 2012 and Judge Peck quickly reduced that submission to a stipulation and order.  The stipulation and order certainly provides more clarity and insight into the process than the status conference transcript.  However, reading the stipulation and order leaves little doubt that the devil is in the details – and there are a lot of details.  Equally clear is the fact that the parties are still in disagreement and the plaintiffs do not support the “joint” protocol laid out in the stipulation and order.  Plaintiffs actually go so far as to incorporate a paragraph into the stipulation and order stating that they “object to this ESI Protocol in its entirety” and they “reserve the right to object to its use in the case.”

These problems underscore some of the points made in a Forbes article published earlier this week titled,Federal Judges Consider Important Issues That Could Shape the Future of Predictive Coding Technology.”  The Forbes article relies in part on a recent predictive coding survey to make the point that, while predictive coding technology has tremendous potential, the solutions need to become more transparent and the workflows must be simplified before they go mainstream.

Survey Says… Information Governance and Predictive Coding Adoption Slow, But Likely to Gain Steam as Technology Improves

Wednesday, February 15th, 2012

The biggest legal technology event of the year, otherwise known as LegalTech New York, always seems to have a few common rallying cries and this year was no different.  In addition to cloud computing and social media, predictive coding and information governance were hot topics of discussion that dominated banter among vendors, speakers, and customers.  Symantec conducted a survey on the exhibit show floor to find out what attendees really thought about these two burgeoning areas and to explore what the future might hold.

Information Governance is critical, understood, and necessary – but it is not yet being adequately addressed.

Although 84% of respondents are familiar with the term information governance and 73% believe that an integrated information governance strategy is critical to reducing information risk and cost, only 19% have implemented an information governance solution.  These results beg the question, if information governance is critical, then why aren’t more organizations adopting information governance practices?

Perhaps the answer lies in the cross-functional nature of information governance and confusion about who is responsible for the organization’s information governance strategy.  For example, the survey also revealed that information governance is a concept that incorporates multiple functions across the organization, including email/records retention, data storage, data security and privacy, compliance, and eDiscovery.  Given the broad impact of information governance across the organization, it is no surprise  respondents also indicated that multiple departments within the organization – including Legal, IT, Compliance, and Records Management – have an ownership stake.

These results tend to suggest at least two things.  First, information governance is a concept that touches multiple parts of the organization.  Defining and implementing appropriate information governance policies across the organization should include an integrated strategy that involves key stakeholders within the organization.  Second, recognition that information governance is a common goal across the entire organization highlights the fact that technology must evolve to help address information governance challenges.

The days of relying too heavily on disconnected point solutions to address eDiscovery, storage, data security, and record retention concerns are limited as organizations continue to mandate internal cost cutting and data security measures.  Decreasing the number of point solutions an organization supports and improving integration between the remaining solutions is a key component of a good information governance strategy because it has the effect of driving down technology and labor costs.   Similarly, an integrated solution strategy helps streamline the backup, retrieval, and overall management of critical data, which simultaneously increases worker productivity and reduces organizational risk in areas such as eDiscovery and data loss prevention.

The trail that leads from point solutions to an integrated solution strategy is already being blazed in the eDiscovery space and this trend serves as a good information governance roadmap.  More and more enterprises faced with investigations and litigation avoid the cost and time of deploying point solutions to address legal hold, data collection, data processing, and document review in favor of a single, integrated, enterprise eDiscovery platform.  The resulting reduction in cost and risk is significant and is fueling support for even broader information governance initiatives in other areas.  These broader initiatives will still include integrated eDiscovery solutions, but the initiatives will continue to expand the integrated solution approach into other areas such as storage management, record retention, and data security technologies to name a few.

Despite mainstream familiarity, predictive coding technology has not yet seen mainstream adoption but the future looks promising.

Much like the term information governance, most respondents were familiar with predictive coding technology for electronic discovery, but the survey results indicated that adoption of the technology to date has been weak.  Specifically, the survey revealed that while 97% of respondents are familiar with the term predictive coding, only 12% have adopted predictive coding technology.  Another 19% are “currently adopting” or plan to adopt predictive coding technology, but the timeline for adoption is unclear.

When asked what challenges “held back” respondents from adopting predictive coding technology, most cited accuracy, cost, and defensibility as their primary concerns.  Concerns about “privilege/confidentiality” and difficulty understanding the technology were also cited as reasons impeding adoption.  Significantly, 70% of respondents believe that predictive coding technology would “go mainstream” if it was easier to use, more transparent, and less expensive. These findings are consistent with the observations articulated in my recent blog (2012:  Year of the Dragon and Predictive Coding – Will the eDiscovery Landscape Be Forever Changed?)

The survey results combined with the potential cost savings associated with predictive coding technology suggest that the movement toward predictive coding technology is gaining steam.  Lawyers are typically reluctant to embrace new technology that is not intuitive because it is difficult to defend a process that is difficult to understand.  The complexity and confusion surrounding today’s predictive coding technology was highlighted recently in Da Silva Moore v. Publicis Group, et. al. during a recent status conference.  The case is venued in Southern District of New York Federal Court before Judge Andrew Peck and serves as further evidence that predictive coding technology is gaining steam.  Expect future proceedings in the Da Silva Moore case to further validate these survey results by revealing both the promise and complexity of current predictive coding technologies.  Similarly, expect next generation predictive coding technology to address current complexities by becoming easier to use, more transparent, and less expensive.

LTNY Wrap-Up – What Did We Learn About eDiscovery?

Friday, February 10th, 2012

Now that that dust has settled, the folks who attended LegalTech New York 2012 can try to get to the mountain of emails that accumulated during the event that was LegalTech. Fortunately, there was no ice storm this year, and for the most part, people seemed to heed my “what not to do at LTNY” list. I even found the Starbucks across the street more crowded than the one in the hotel. There was some alcohol-induced hooliganism at a vendor’s party, but most of the other social mixers seemed uniformly tame.

Part of Dan Patrick’s syndicated radio show features a “What Did We Learn Today?” segment, and that inquiry seems fitting for this year’s LegalTech.

  • First of all, the prognostications about buzzwords were spot on, with no shortage of cycles spent on predictive coding (aka Technology Assisted Review). The general session on Monday, hosted by Symantec, had close to a thousand attendees on the edge of their seats to hear Judge Peck, Maura Grossman and Ralph Losey wax eloquently about the ongoing man versus machine debate. Judge Peck uttered a number of quotable sound bites, including the quote of the day: “Keyword searching is absolutely terrible, in terms of statistical responsiveness.” Stay tuned for a longer post with more comments from the General session.
  • Ralph Losey went one step further when commenting on keyword search, stating: “It doesn’t work,… I hope it’s been discredited.” A few have commented that this lambasting may have gone too far, and I’d tend to agree.  It’s not that keyword search is horrific per se. It’s just that its efficacy is limited and the hubris of the average user, who thinks eDiscovery search is like Google search, is where the real trouble lies. It’s important to keep in mind that all these eDiscovery applications are just like tools in the practitioners’ toolbox and they need to be deployed for the right task. Otherwise, the old saw (pun intended) that “when you’re a hammer everything looks like a nail” will inevitably come true.
  • This year’s show also finally put a nail in the coffin of the human review process as the eDiscovery gold standard. That doesn’t mean that attorneys everywhere will abandon the linear review process any time soon, but hopefully it’s becoming increasingly clear that the “evil we know” isn’t very accurate (on top of being very expensive). If that deadly combination doesn’t get folks experimenting with technology assisted review, I don’t know what will.
  • Information governance was also a hot topic, only paling in comparison to Predictive Coding. A survey Symantec conducted at the show indicated that this topic is gaining momentum, but still has a ways to go in terms of action. While 73% of respondents believe an integrated information governance strategy is critical to reducing information risk, only 19% have implemented a system to help them with the problem. This gap presumably indicates a ton of upside for vendors who have a good, attainable information governance solution set.
  • The Hilton still leaves much to be desired as a host location. As they say, familiarity breeds contempt, and for those who’ve notched more than a handful of LegalTech shows, the venue can feel a bit like the movie Groundhog Day, but without Bill Murray. Speculation continues to run rampant about a move to the Javits Center, but the show would likely need to expand pretty significantly before ALM would make the move. And, if there ever was a change, people would assuredly think back with nostalgia on the good old days at the Hilton.
  • Despite the bright lights and elevator advertisement trauma, the mood seemed pretty ebullient, with tons of partnerships, product announcements and consolidation. This positive vibe was a nice change after the last two years when there was still a dark cloud looming over the industry and economy in general.
  • Finally, this year’s show also seemed to embrace social media in a way that it hadn’t done so in years past. Yes, all the social media vehicles were around in years past, but this year many of the vendors’ campaigns seemed to be much more integrated. It was funny to see even the most technically resistant lawyers log in to Twitter (for the first time) to post comments about the show as a way to win premium vendor swag. Next year, I’m sure we’ll see an even more pervasive social media influence, which is a bit ironic given the eDiscovery challenges associated with collecting and reviewing social media content.

Breaking News: Federal Circuit Denies Google’s eDiscovery Mandamus Petition

Wednesday, February 8th, 2012

The U.S. Court of Appeals for the Federal Circuit dealt Google a devastating blow Monday in connection with Oracle America’s patent and copyright infringement suit against Google involving features of Java and Android. The Federal Circuit affirmed the district court’s order that a key email was not entitled to protection under the attorney-client privilege.

Google had argued that the email was privileged under Upjohn Co. v. United States, asserting that the message reflected discussions about litigation strategy between a company engineer and in-house counsel. While acknowledging that Upjohn would protect such discussions, the court rejected that characterization of the email.  Instead, the court held that the email reflected a tactical discussion about “negotiation strategy” with Google management, not an “infringement or invalidity analysis” with Google counsel.

Getting beyond the core privilege issues, Google might have avoided this dispute had it withheld the eight earlier drafts of the email that it produced to Oracle. As we discussed in our previous post, organizations conducting privilege reviews should consider using robust, next generation eDiscovery technology such as email analytical software, that could have isolated the drafts and potentially removed them from production. Other technological capabilities, such as Near Duplicate Identification, could also have helped identify draft materials and marry them up with finals marked as privileged. As this case shows, in the fast moving era of eDiscovery, having the right technology is essential for maintaining a strategic advantage in litigation.

The Social Media Rubik’s Cube: FINRA Solved it First, Are Non-Regulated Industries Next?

Wednesday, January 25th, 2012

It’s no surprise that the first industry to be heavily regulated regarding social media use was the financial services industry. The predominant factor that drove regulators to address the viral qualities of social media was the fiduciary nature of investing that accompanies securities, coupled with the potential detrimental financial impact these offerings could have on investors.

Although there is no explicit language in FINRA’s Regulatory Notices 10-06 (January 2010) or 11-30 (August 2011) requiring archival, the record keeping component of the notices necessitate social media archiving in most cases due to the sheer volume of data produced on social media sites. Melanie Kalemba, Vice President of Business Development at SocialWare in Austin, Texas states:

“Our clients in the financial industry have led the way, they have paved the road for other industries, making social media usage less daunting. Best practices for monitoring third-party content, record keeping responsibilities, and compliance programs are available and developed for other industries to learn from. The template is made.”

eDiscovery and Privacy Implications. Privacy laws are an important aspect of social media use that impact discoverability. Discovery and privacy represent layers of the Rubik’s cube in the ever-changing and complex social media environment. No longer are social media cases only personal injury suits or HR incidents, although those are plentiful. For example, in Largent v. Reed the court ruled that information posted by a party on their personal Facebook page was discoverable and ordered the plaintiff to provide user name and password to enable the production of the information. In granting the motion to compel the Defendant’s login credentials, Judge Walsh acknowledged that Facebook has privacy settings, and that users must take “affirmative steps” to keep their information private. However, his ruling determined that no social media privacy privilege exists: “No court has recognized such a privilege, and neither will we.” He further reiterated his ruling by adding, “[o]nly the uninitiated or foolish could believe that Facebook is an online lockbox of secrets.”

Then there are the new cases emerging over social media account ownership which affect privacy and discoverability. In the recently filed Phonedog v. Kravitz, 11-03474 (N.D. Cal.; Nov. 8, 2011), the lines between the “professional” versus the “private” user are becoming increasingly blurred. This case also raises questions about proprietary client lists, valuations on followers, and trade secrets  – all of which are further complicated when there is no social media policy in place. The financial services industry has been successful in implementing effective social media policies along with technology to comply with agency mandates – not only because they were forced to by regulation, but because they have developed best practices that essentially incorporate social media into their document retention policies and information governance infrastructures.

Regulatory Framework. Adding another Rubik’s layer are the multitude of regulatory and compliance issues that many industries face. The most active and vocal regulators for guidance in the US on social media have been FINRA, the SEC and the FTC. FINRA initiated guidance to the financial services industry, and earlier this month the SEC issued their alert. The SEC’s exam alert to registered investment advisers issued on January 4, 2012 was not meant to be a comprehensive summary for compliance related to the use of social media. Instead, it lays out staff observations of three major categories: third party content, record keeping and compliance – expounding on FINRA’s notice.

Last year the FTC issued an extremely well done Preliminary FTC Staff Report on Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers.  Three main components are central to the report. The first is a call for all companies to build privacy and security mechanisms into new products – considering the possible negative ramifications at the outset, avoiding social media and privacy issues as an afterthought. The FTC has cleverly coined the notion, “Privacy by Design.” Second, “Just-In-Time” is a concept about notice and encourages companies to communicate with the public in a simple way that prompts them to make informed decisions about their data in terms that are clear and that require an affirmative action (i.e., checking a box). Finally, the FTC calls for greater transparency around data collection, use and retention. The FTC asserts that consumers have a right to know what kind of data companies collect, and should have access to the sensitivity and intended use of that data. The FTC’s report is intended to inform policymakers, including Congress, as they legislate on privacy – and to motivate companies to self-regulate and develop best practices. 

David Shonka, Principal Deputy General Counsel at the FTC in Washington, D.C., warns, “There is a real tension between the situations where a company needs to collect data about a transaction versus the liabilities associated with keeping unneeded data due to privacy concerns. Generally, archiving everything is a mistake.” Shonka arguably reinforces the case for instituting an intelligent archive, whether a company is regulated or not;  an archive that is selective about what it ingests based on content, and that has an appropriate deletion cycle applied to defined data types/content according to a policy. This will ensure expiry of private consumer information in a timely manner, but retains the benefits of retrieval for a defined period if necessary.

The Non-Regulated Use Case­. When will comprehensive social media policies, retention and monitoring become more prevalent in the non-regulated sectors? In the case of FINRA and the SEC, regulations were issued to the financial industry. In the case of the FTC, guidance had been given to companies regarding how to avoid false advertisement and protect consumer privacy. The two are not dissimilar in effect. Both require a social media policy, monitoring, auditing, technology, and training. While there is no clear mandate to archive social media if you are in a non-regulated industry, this can’t be too far away. This is evidenced by companies that have already implemented social media monitoring systems for reasons like brand promotion/protection, or healthcare companies that deal with highly sensitive information. If social media is replacing email, and social media is essentially another form of electronic evidence, why would social media not be part of the integral document retention/expiry procedures within an organization?

Content-based monitoring and archiving is possible with technology available today, as the financial sector has demonstrated. Debbi Corej, who is a compliance expert for the financial sector and has successfully implemented an intensive social media program, says it perfectly: “How do you get to yes? Yes you can use social media, but in a compliant way.” The answer can be found at LegalTech New YorkJanuary 30 @ 2:00pm.

2012: Year of the Dragon – and Predictive Coding. Will the eDiscovery Landscape Be Forever Changed?

Monday, January 23rd, 2012

2012 is the Year of the Dragon – which is fitting, since no other Chinese Zodiac sign represents the promise, challenge, and evolution of predictive coding technology more than the Dragon.  The few who have embraced predictive coding technology exemplify symbolic traits of the Dragon that include being unafraid of challenges and willing to take risks.  In the legal profession, taking risks typically isn’t in a lawyer’s DNA, which might explain why predictive coding technology has seen lackluster adoption among lawyers despite the hype.  This blog explores the promise of predictive coding technology, why predictive coding has not been widely adopted in eDiscovery, and explains why 2012 is likely to be remembered as the year of predictive coding.

What is predictive coding?

Predictive coding refers to machine learning technology that can be used to automatically predict how documents should be classified based on limited human input.  In litigation, predictive coding technology can be used to rank and then “code” or “tag” electronic documents based on criteria such as “relevance” and “privilege” so organizations can reduce the amount of time and money spent on traditional page by page attorney document review during discovery.

Generally, the technology works by prioritizing the most important documents for review by ranking them.  In addition to helping attorneys find important documents faster, this prioritization and ranking of documents can even eliminate the need to review documents with the lowest rankings in certain situations. Additionally, since computers don’t get tired or day dream, many believe computers can even predict document relevance better than their human counterparts.

Why hasn’t predictive coding gone mainstream yet?

Given the promise of faster and less expensive document review, combined with higher accuracy rates, many are perplexed as to why predictive coding technology hasn’t been widely adopted in eDiscovery.  The answer really boils down to one simple concept – a lack of transparency.

Difficult to Use

First, early predictive coding tools attempt to apply a complicated new technological approach to a document review process that has traditionally been very simple.  Instead of relying on attorneys to read each and every document to determine relevance, the success of today’s predictive coding technology typically depends on review decisions input into a computer by one or more experienced senior attorneys.  The process commonly involves a complex series of steps that include sampling, testing, reviewing, and measuring results in order to fine tune an algorithm that will eventually be used to predict the relevancy of the remaining documents.

The problem with early predictive coding technologies is that the majority of these complex steps are done in a ‘black box’.  In other words, the methodology and results are not always clear, which increases the risk of human error and makes the integrity of the electronic discovery process difficult to defend.  For example, the methodology for selecting a statistically relevant sample is not always intuitive to the end user.  This fundamental problem could result in improper sampling techniques that could taint the accuracy of the entire process.  Similarly, the process must often be repeated several times in order to improve accuracy rates.  Even if accuracy is improved, it may be difficult or impossible to explain how accuracy thresholds were determined or to explain why coding decisions were applied to some documents and not others.

Accuracy Concerns

Early predictive coding tools also tend to lack transparency in the way the technology evaluates the language contained in each document.  Instead of evaluating both the text and metadata fields within a document, some technologies actually ignore document metadata.  This omission means a privileged email sent by a client to her attorney, Larry Lawyer, might be overlooked by the computer if the name “Larry Lawyer” is only part of the “recipient” metadata field of the document and isn’t part of the document text.  The obvious risk is that this situation could lead to privilege waiver if it is inadvertently produced to the opposing party.

Another practical concern is that some technologies do not allow reviewers to make a distinction between relevant and non-relevant language contained within individual documents.  For example, early predictive coding technologies are not intelligent enough to know that only the second paragraph on page 95 of a 100-page document contains relevant language.  The inability to discern what language  led to the determination that the document is relevant could skew results when the computer tries to identify other documents with the same characteristics.  This lack of precision increases the likelihood that the computer will retrieve an over-inclusive number of irrelevant documents.  This problem is generally referred to as ‘excessive recall,’ and it is important because this lack of precision increases the number of documents requiring manual review which directly impacts eDiscovery cost.

Waiver & Defensibility

Perhaps the biggest concern with early predictive coding technology is the risk of waiver and concerns about defensibility.  Notably, there have been no known judicial decisions that specifically address the defensibility of these new technology tools even though some in the judiciary, including U.S. Magistrate Judge Andrew Peck, have opined that this kind of technology should be used in certain cases.

The problem is that today’s predictive coding tools are difficult to use, complicated for the average attorney, and the way they work simply isn’t transparent.  All these limitations increase the risk of human error.  Introducing human error increases the risk of overlooking important documents or unwittingly producing privileged documents.  Similarly, it is difficult to defend a technological process that isn’t always clear in an era where many lawyers are still uncomfortable with keyword searches.  In short, using black box technology that is difficult to use and understand is perceived as risky, and many attorneys have taken a wait-and-see approach because they are unwilling to be the guinea pig.

Why is 2012 likely to be the year of predictive coding?

The word transparency may seem like a vague term, but it is the critical element missing from today’s predictive coding technology offerings.  2012 is likely to be the year of predictive coding because improvements in transparency will shine a light into the black box of predictive coding technology that hasn’t existed until now.  In simple terms, increasing transparency will simplify the user experience and improve accuracy which will reduce longstanding concerns about defensibility and privilege waiver.

Ease of Use

First, transparent predictive coding technology will help minimize the risk of human error by incorporating an intuitive user interface into a complicated solution.  New interfaces will include easy-to-use workflow management consoles to guide the reviewer through a step-by-step process for selecting, reviewing, and testing data samples in a way that minimizes guesswork and confusion.  By automating the sampling and testing process, the risk of human error can be minimized which decreases the risk of waiver or discovery sanctions that could result if documents are improperly coded.  Similarly, automated reporting capabilities make it easier for producing parties to evaluate and understand how key decisions were made throughout the process, thereby making it easier for them to defend the reasonableness of their approach.

Intuitive reports also help the producing party measure and evaluate confidence levels throughout the testing process until appropriate confidence levels are achieved.  Since confidence levels can actually be measured as a percentage, attorneys and judges are in a position to negotiate and debate the desired level of confidence for a production set rather than relying exclusively on the representations or decisions of a single party.  This added transparency allows the type of cooperation between parties called for in the Sedona Cooperation Proclamation and gives judges an objective tool for evaluating each party’s behavior.

Accuracy & Efficiency

2012 is also likely to be the year of transparent predictive coding technology because technical limitations that have impacted the accuracy and efficiency of earlier tools will be addressed.  For example, new technology will analyze both document text and metadata to avoid the risk that responsive or privileged documents are overlooked.  Similarly, smart tagging features will enable reviewers to highlight specific language in documents to determine a document’s relevance or non-relevance so that coding predictions will be more accurate and fewer non-relevant documents will be recalled for review.

Conclusion - Transparency Provides Defensibility

The bottom line is that predictive coding technology has not enjoyed widespread adoption in the eDiscovery process due to concerns about simplicity and accuracy that breed larger concerns about defensibility.  Defending the use of black box technology that is difficult to use and understand is a risk that many attorneys simply are not willing to take, and these concerns have deterred widespread adoption of early predictive coding technology tools.  In 2012, next generation transparent predictive coding technology will usher in a new era of computer-assisted document review that is easy to use, more accurate, and easier to defend. Given these exciting technological advancements, I predict that 2012 will not only be the year of the dragon, it will also be the year of predictive coding.