Business Ethics & Corporate Crime Research Universidade de São Paulo
FacebookTwitterGoogle PlusYoutube

Building Big-Tech CSAM Trust & Safety from unexamined P2P Big-data mines

Image retrieved from: Wikimedia Commons

Author: Carolina Christofoletti

Link in original: Click here

As part of my CSAM Research Ethics, pre-filtered Comma Separated Valued (CVS) tables (Excel textual tables, so to say), are the only dataset I work with. And recently, I had the opportunity of having an analytical look at what the Child Sexual Abuse Material (CSAM) situation on peer-to-peer (P2P) network was through, once again, a pre-filtered CVS file.

Knowing how P2P networks work and knowing, also, the URL-based situation, I must say that there were some points that stuck my immediate attention. Having a simple look at the CVS file of a CSAM downloader, the fact that one enters the network downloading an illegal file seemed to be, at least, remarkable (and I am planning, further, to cross this hypothesis in a more extensive datasets).

Additionally, there was something that, apart from the explicit titles and apart from the fact that the search patterns of those criminals are fairly easy to recover (if one is up to analyse that), the fact that we have CSAM files being labelled with social media stamps in their titles seemed to me almost unbelievable.

Who needs metadata if criminals are, very probably, writing down the original data extensively in a title section one cannot change?

Are those self-classifiers matching? I do not know. Validating this hypothesis is but possible: If ever those social media names, whose name appear on those CSAM files as title, ever accept testing my hypothesis: The CSAM hash-values that big-tech are looking on their platforms appear, in an interrelated way, on P2P platforms.

Forget not the time stamps: If Facebook, for example, finds in 19.06.2021 a new CSAM hash value on their platforms with appears already on 03.03.2021 in a P2P platform, CSAM peer-to-peer files that criminals are claiming to be Facebook-based could have served as an important intelligence, if P2P and Big Tech CSAM intelligence were ever integrated.

Forget not the protocol: The day and time of the file inclusion is embalmed in the network protocol… if that file stay, shared-like, there. Fortunately, protocols are patterns-based and those patterns are also to be explored.

How actionable this intelligence is or would have been is something that, for that reason, is also to be validated – and I am indicating you that a matching methodology exist (and I have it already sketched, and it is waiting for a big-tech partner who wants to cooperate on those matters).

But the amount of P2P CSAM files are impressive for big-tech companies to have to deal with. True, and that is why I am proposing the filter.

 Even because, we must also work with the hypothesis that the reason why criminals platform-stamp those files is simply because, sometimes, the fact that this very file they do not really know what it is but that they select for download (they will discover it only when they open their downloads) was hosted once in its life, for example, by Facebook make it look like as if “illegality is questionable”.

Big techs should be, as such, worried with those CSAM files which have their companies’ names stamped on their titles, damaging, as such, their Trust & Safety reputation. And a criminal whose uploads have this very same characteristics deserve, for sure, also researchers attention.

Criminals and legal practitioners trust the power of those big-tech platforms to never host CSAM files so, for criminals, if Facebook hosted, it might not be illegal. Criminals will never find out if Facebook, for example, removed it or not. But, by the titles, they know Facebook hosted it. CSAM P2P networks are, as such, something that deserve a different criminological attention – they are not the same as they URL peers.

Crawling titles is easier than “crawling”, one by one, images – with a much more complex technology. I have but already giving you a (possibly) matching filter to work with, a filter that shall serve to identify, speedily and coherently, “platform-new” CSAM material, both for P2P (where Bit-techs are the source) and for Big-techs (where P2P are the source).

That would have been a way of killing two, three, four or more birds… with a single stone.

Think about it!