Predictive coding: taming the data tiger

There is growing interest in predictive coding in the UK, with the first-ever judicial approval for its use during disclosure in English High Court proceedings coming in February, in the Pyrrho Investments case. In May The Lawyer’s live webcast, in association with FTI Consulting, examined issues raised by the case and assessed the impact it will have on the UK market for predictive coding.

Screen Shot 2016-06-10 at 17.33.46 In association with FTI Consulting

Matt Byrne: So Jon [Fowler], could you first explain what predictive coding is?

Jon Fowler: It’s is a unique mix of technology, legal expertise and statistics, and its aim is to bring efficiency and consistency to the review
process.

Byrne: To put it into context, how many documents do lawyers face these days on typical cases?

Fowler: It can be up to many millions. In the Pyrrho case we were talking three million documents, and that’s not unusual.

Byrne: Can you give a few details on how predictive coding differs from traditional linear review?

Fowler: Traditionally, in a case like Pyrrho you might have taken millions of documents from your clients, run keyword searches and a date range, and then put those through to your team of lawyers to review. With predictive coding we take a small, statistically relevant, random sample of data, have it reviewed by case experts such as the partner on the case, and then use those decisions to create a predictive coding model. It’s important that the data is truly random so we get the computer to pick it for us.

That first set of data is used to train a predictive coding model that looks at the characteristics of each document and tries to decide why you’ve marked it relevant or not relevant. So, for example, every time it says ‘fraud’ you’re saying it’s relevant, and every time it says ‘table’ you’re saying it’s not relevant. So it’s learning from those decisions.

But it’s not going to be ready first time so we send it back into the data set and ask it to find more documents it thinks will help it learn more. We have those coded and we use that to train the model. It’s an iterative process.

At some stage we’re going to be happy with the results that we’re getting but we need to validate to make it defensible, so we go back and get another random sample, have that reviewed and score it to validate the results of the model.

Byrne: What do you mean by ‘score’?

Fowler: It means we’re going to give every document in your data set, your millions of documents, a score as to how relevant or not relevant the model believes it is to your case. We can use that to rank those documents from most relevant to least relevant. Then we can, for example, set a threshold and say everything above this score is relevant and everything below is irrelevant, or we could simply use it to rank the documents and look at the most relevant ones first.

Byrne: Why do you think predictive coding is gaining in significance in the UK market?

Fowler: We’ve got a bit of an advantage in that it’s been used in the US for quite some time so people have become broadly comfortable with the technology. And this judicial approval is going to make people a lot more comfortable.

Byrne: Giulia, to what extent does the fact that your external lawyers can now point to the Pyrrho case as a example bring you comfort?

Giulia Da Re: It provides significant comfort. In-house lawyers take disclosure obligations seriously. We want to feel our approach to disclosure is appropriate to the case and within the court rules. In-house lawyers are keen to avoid feeling like they’re going out on a limb with technology that’s not yet mainstream and hasn’t been approved by the courts. With that in mind, these cases will soften concerns considerably. I should add that Lloyds started using predictive coding before the Pyrrho judgment came out so we were already comfortable that it was a good tool.

Byrne: Master Matthews made it sound incredibly appealing so why do you think it hasn’t been used more widely?

Da Re: The ‘black box issue’ is a factor. Lawyers are used to being challenged at every step of litigation so we want to feel confident we can provide a transparent explanation of the steps we’re taking. In that context it’s not surprising there’s some caution about using a complex and unfamiliar technology.

I think that once practitioners become more familiar with predictive coding and realise there are justifiable and explainable steps they will be less concerned.

Byrne: Mark, do think Pyrrho and last week’s BCA Trading case will increase clients’ appetite and willingness to use predictive coding?

Mark Chesher: It might. It certainly brings it to the fore. I’ve been using the software for two and half years and in that time I’ve become more comfortable about how it works, where it works, where it might not work and where I can be confident in recommending that a client uses it.

Byrne: Do you agree with Jon’s points about the benefits of using it?

Chesher: Absolutely. It’s really useful in the way we’ve been using it – to prioritise review. We’ve used it in parallel with a traditional keyword search and manual review process, getting the top 10 or 20 per cent of relevant documents straight to the senior lawyers to review and then exercising judgement, which is where lawyers add value, rather than just churning through documents.

It’s also useful is in QC [quality control], checking a manual review. It’s good at identifying anomalies in the way things have been coded.

Byrne: Giulia, what do you think?

Da Re: I agree. It’s helpful in prioritising documents you might want to review immediately and it makes the review process more efficient because you can have these collections of documents you anticipate are going to be more or less responsive.

Byrne: Mark, is it only appropriate to use it in cases with millions of documents?

Chesher: Not necessarily. The cases it’ll work best for are those with a high proportion of relevant documents and a reasonably tight set of issues, and the documents have been collected from people doing similar jobs. Where it’s going to struggle is where you’ve got huge lists of disparate issues and a wide variety of repositories of documents, and maybe different languages. It’s not necessarily just a case of – if you’ve got lots of documents, throw predictive coding at it.

Byrne: Does it lend itself only to being used by large legal services providers?

Chesher: No, possibly the converse. As a smaller firm with less resource it enables you to leverage off your subject matter expert’s knowledge and use predictive coding to do something that a few years ago would have been undertaken by a large number of paralegals or junior lawyers. You’re in more control of the process if you’re doing it yourself.

Byrne: Are there any types of disputes where it’s not appropriate?

Da Re: If there’s a small number of documents you’d need to do a cost-benefit analysis of whether the set-up costs are worth it in the long run.

Fowler: It’s going to work better on populations with a high number of relevant documents. What it’s not going to do is find the smoking gun that’s going to blow your case apart. The machine needs you to teach it and needs lawyers to show it what is interesting and what is not, and it will consistently mimic those decisions across other documents. If you didn’t know a document existed it’s not going to be able to find it for you. If that is something you need to do there are many other techniques we can use but predictive coding is probably not the one.

Byrne: What about costs? How much can predictive coding save clients?

Fowler: It’s hard to quantify because it depends on the case, but at the end of the day predictive coding is going to reduce the number of documents lawyers have to look at, so that’s going to cut technology costs and the costs of the people. In a recent case in the US we saved our client from reviewing over 87 per cent of the document set, saving them more than $1m.

Byrne: Giulia, from a client’s point of view, what are your concerns about using it?

Da Re: There are three main concerns. First, it might miss disclosable documents. Second, that material that is not responsive but is privileged or contains client confidential or sensitive information is disclosed unnecessarily. And third, that the court will decide predictive coding is not an appropriate approach for a particular case. Of course, with the two recent cases that last concern has been considerably softened.

Byrne: Jon, what are your thoughts on the risks?

Fowler: There are inherent risks using any technology in legal process but in some ways the risks are even higher using a keyword, as it may be mis-spelt or not even there. It’s all about ensuring you’re partnering with the right experts who can talk you through the process, and you’re using the right tools.

Byrne: Mark, will this technology and other types of AI put people out of jobs?

Chester: They won’t put people out of jobs but they will change the way we work. A lot of this is really just meeting the challenge that’s being thrown at litigation lawyers by the proliferation of email and electronic documents generally. It’s just a tool in our armoury.

Fowler: Yes, the issue we’ve got here is the growing document population. Every year we get more and it’s not sustainable for any law firm to sit there and go through the amount of documents you’d have to, to be able to claim you’d taken a proportionate approach. Predictive coding allows lawyers to get on with doing what they were trained to do which is to add value to a case through their experience. I don’t think it’s going to be putting anyone out of a job any time soon.

Da Re: At Lloyds we speak to third-party service providers because we want to keep abreast of developments, and we’d expect our external counsel to do the same. So if we had a case that might be suitable we’d expect to have conversations around what we can use, whether it is suitable for the case and how we can overcome the risks.

Byrne: In the US it took about four years to take off. Will it be the same here?

Fowler: It’ll be much faster. We can learn from the experiences they’ve had in the US, so it will be taken up more quickly.

View the webcast at TheLawyer.com/webcast