Did AC12 hire a bent forensic linguist?

The big reveal in the season finale of Line of Duty was that it was a forensic linguist, who identified ‘H’ aka “the Fourth man” not as some criminal mastermind, but as the bumbling and over-promoted DI Ian Buckells.  The TV police drama has followed the incremental investigations of anti-corruption unit AC12 of the fictional Central Police.  Over the course of six series AC12 has established links between Organised Crime Groups and institutionalised police corruption, and in doing so it has traded on a tension between procedural hyper-realism and credulity-stretching flights of fancy.  The final message of the final episode appears to be that competence can become an ethics issue.  Buckells was no Moriarty, his corruption arose from inability, or lack of enthusiasm in doing his job well.  It was disappointing therefore that the culmination of AC12’s investigation relied on the evidence of a less than competent forensic linguist.

The linguistic plot-twist of the show rested on the spelling “definately” in the communications from the anonymous H. This misspelling had been flagged in an earlier series where we saw Superintendent Ted Hastings, use “definately” in an online chat, in which he assumed the linguistic identity of H.  H was discovered by searching though police files for this misspelling, and in the final interview scene a report is produced from a forensic linguist, that declares H is in fact DI Buckells. 

Each of these events relate to tasks which have been researched by forensic linguists.  Research in linguistic identity assumption shows that identifying lower-level linguistic features such as spelling patterns are necessary to successfully takeover someone’s identity in an online chat, but they are not enough.  Unless Hastings is also able to copy H’s linguistic turn taking and topic control, he is more likely to be discovered.  This is difficult but can be trained.  Our experiments with trainee undercover online officers showed that forensic linguistic training reduced detections by suspicious interactants from about 75% to 25%.  A further issue for Hastings is that he needs to avoid linguistic leakage of his own identity. He would be wise not to drop in the phrase “Mother of God” when chatting as ‘H’.

Using language style as a basis of an author search is also subject to intensive research.  One focus is to use writings by offenders in darkweb fora as a basis for searching for an individual’s writings on the open web.  Author search was used in this way as one of many techniques in the investigation that identified Matthew Falder, one of the UKs worst online sexual abusers. Another example is that the single-word greeting “Hiyas” contributed to the identification of paedophile, Shannon McCoole.  Searching written style in a fairly small collection of police files is significantly easier than searching all of the Internet.

It is the comparative authorship analysis task in Line of Duty , that is perhaps most known application of forensic linguistics, and in the real world this is a task that can and does produce evidence that is taken to court.  The dialogue of Buckells’ investigative interview in Line of Duty would have us believe such evidence can be very strong. Buckells is told that on the basis of the spelling “definately”, and a brief mention of some syntax features, that AC12s forensic linguist “concludes there is a 95% probability the messages detected on GGM-13 were written by you.” This is of itself a red flag suggesting that AC12s forensic linguist is either incompetent or perhaps, for reasons unknown, framing Buckells to take the hit.  

The tell that this is an inept or perhaps bent forensic linguist is the cited 95% probability.  This is not only bad forensic linguistics, it is also poor forensic science.  Contemporary research in forensic science suggests that identification evidence as should not be provided as a conclusion of match or non-match, even with a mitigating probability score.  Such an expression of the result is both  logically incorrect and usurps the role of the jury.  It is the job of the forensic scientist to express the weight of their evidence, and this is best achieved by taking a competing hypothesis approach.  In AC12s case we can easily create the two competing hypotheses with regard to the spelling “definately”.  These would be expressed as 

  1. “the probability of seeing “definately” given the hypothesis that Buckells wrote the ‘H’ messages; and,
  2.  the probability of seeing “definately” given the hypothesis that anyone else wrote the ‘H’ messages.

To calculate the probability for the first hypothesis you might trawl through all of Buckells’ and H’s writings to come up with a rate of occurrence per thousand words for each of them.  This gives some measure of similarity between Buckell’s and H’s writings in this regard.  But similarity is not enough.  To calculate the probability of the second hypothesis we need to research some base-rate information.  This is where things get tricky.  A quick google search gives me nearly 16,000,000 hits for the spelling “definately” and it is unlikely that all of these are references to Line of Duty.  The conclusion though is that this is not a rare spelling error and if this spelling is common then the similarity has little evidential weight.  If the spelling can be shown to be rare the evidence might then carry more weight. This weight of evidence can be best expressed as a ratio of the probabilities of the two hypotheses although studies show juries struggle to understand these Bayesian likelihood ratios when expressed mathematically.

It’s not enough however to only know population rate for the “definately” spelling.  A forensic linguist is above all a linguist and understands how language varies across communities.  Spellings which are globally rare can be locally common.  Research would be required to show the differing rates of occurrence of “definately” in the staff of the Central Police rather than in the general population. Spelling variants will also be more or less common across different genres of communication as lots of creative spellings are acceptable in online communications such as internet chat.  The most difficult and time-consuming part of providing a forensic authorship analysis is identifying and collecting the best comparison samples.  There are always compromised decisions in this respect.  Can I collect the language of police generally, or should I focus just on Central Police? Can I use the email database when the questioned messages are online chat? And so on. These are just some of the reasons why a lack of certainty can be a true mark of a forensic linguistic expertise. 

I would couch my report in terms of points of distinctiveness and consistency across the questioned samples, the known samples and the comparison corpus.  I would definately never use 95% certainty.  AC12’s forensic linguist however, appears to be uninterested in base rates, uninterested in the cross-genre nature of their analysis, and has expressed their result as a percentage certainty of match.  We have grounds to ask are they incompetent, or are they bent?

Leave a comment