Ethics and ‘Big Data’ Geospatial Research

On March 3 2016 a paper was published titled “Tagging Banksy: using geographic profiling to investigate a modern art mystery”. The article describes a technique for analysing geospatial aspects of a large data set of artworks to identify the author of the graffiti. The abstract of the article already gives you an idea of where this is heading:

More broadly, these results support previous suggestions that analysis of minor terrorism-related acts (e.g., graffiti) could be used to help locate terrorist bases before more serious incidents occur, and provides a fascinating example of the application of the model to a complex, real-world problem.

The authors have taken the set of artworks from the official website of Banksy. They then made the effort to visit the locations in person to record the precise location in London and Bristol. Astonishingly they then use a 2008 article from the Daily Mail as a source to list all sorts of personal details:

Three addresses in London were identified: one in the Kingsland Road area, where [identity1] lived with [identity2] in 2004–5, and two for [identity1]’s girlfriend (now wife), [identity3], in the Great North Road area and in the Old Street area. Suspect sites in Bristol included [identity1]’s house in the Easton area of the city, The Plough in Easton (for whom [identity1] played football), and their playing fields at Baptist Mills Primary School, as well as [identity1]’s old school, Bristol Cathedral School.

The article then proceeds to use the DPM model to support their theory, using the highly personal details of the history of [identity1]. In the conclusion they make an offhand remark about the validity of their conclusion:

With no other serious ‘suspects’ to investigate, it is difficult to make conclusive statements about Banksy’s identity based on the analysis presented here, other than saying the peaks of the geoprofiles in both Bristol and London include addresses known to be associated with [identity1].

Finally, the “Ethical note” that concludes the article is just astonishing in its naïvety:

Ethical note: the authors are aware of, and respectful of, the privacy of [identity1] and his relatives and have thus only used data in the public domain. We have deliberately omitted precise addresses.

Ethical analysis

Many researchers on the Internet have already expressed their surprise and astonishment about this publication. The authors do not seem to be aware of the impact that their paper may have on the privacy of Banksy. In this article I have redacted the real names of the people analyzed. The impact of the scientific paper would have been exactly the same if the original authors had done the same thing. The article is about the validity and applicability of a certain model, not about the identity of Banksy.

The authors try to get away with this with the claim that they were all using open data. Researchers should hold themselves to higher standards, and think about the possible impact of their research before publishing it.

It is hard to believe that this research project passed through an Institutional Review Board (IRB) or Ethics Committee. This publication demonstrates that review through these committees should be done also for sciences other than medical and psychological sciences which directly involve human test subjects.