USP - Universidade de São Paulo

A Rhetoric guide for CSAM Transparency Reports readers

Image Retrieved from: Nexo jornal

Author: Carolina Christofoletti

Link in original: click here

Saturday morning, 6 a.m, 15 Celsius degree, I was getting ready to take my Compliance exam’s when I decided to check what was going on Instagram. So fast as I opened the App, SaferNet’s Brasil post was there, shining as the very first post of the day on my timeline: CSAM reports in Brazil have grown 33,4% from January to April 2021, comparing to the same 2020’s period. The exam could wait two more minutes, and I decided to visit SaferNet’s page.

In the national month for Child Sexual Abuse and Exploitation prevention, the page of the Brazilian hotline was full of “report CSAM” campaigns. That called my attention because I could not find, in any of their posts, what exactly SaferNet Brasil wanted me to report. Tricky point for me who was about to write my third Compliance exam. I let in standby and, attending the requests for commentaries, I decided to write this article for you today.

The fact that I could not find any post explaining what I should report, even though the report imperative was there – that is, what do you mean by pornografia infantil (child pornography, as it was written)- has invalidated, almost immediately, this data for my research eyes.

True, SaferNet Brasil receives only reports from Public (they Federal Police report numbers for 2020 is zero, what is a pity, considering that this takes out the chance, sometimes, for lots of files to be hashed and for tendencies to get a big picture approach). As my administrative request also showed, the number of reports received by the Brazilian Federal Police from SaferNet Brazil is also zero (read more about it here). True, SaferNet Brasil displays in its Report page a definition of what CSAM means for Brazil (this is topic for another article). True, international legislation is not harmonic, and this could made Brazil a hosting center for other kind of materials produced abroad and that are explicitly infringing of international legislation (I wrote about it here).

But there is also something that is true: The conclusions deriving from report number argument are a fallacy (a false conclusion) and report numbers are, per se, an aphoria (a premise that brings you nowhere). And with aporia and fallacy in the table, I must call the Greek rethoric back to help with the task of proving where the argumentation path is lost. Step by step, but.

Measuring the working metrics of a CSAM reporting channel

My purpose with this article is to challenge, rhetorically, two common arguments (including the ones pertaining to the above-mentioned report campaign). First, I will set the compliance premise with you, without entering deeply (at this time) in this rationality. A report channel whatsoever is only efficient if:

a) It can reach exactly the ones who are facing what is to be reported (in the case, CSAM)

b) If the ones mentioned in a) can recognize it as CSAM

c) If the ones matching the a) and b) criteria trust the channel and decide, ethically, to report it.

A question for we to start thinking about the hotline case is exactly what happened to the CSAM report channels on TOR. From a practical point of view, I would suggest (with clear, experimental indications of that called by me “Interview Data”) that the TOR was given, at least in Brazil, to the police. That is why police, and not hotline, holds this data. But what about TOR reportees? What happened to them? Dangerously, they seem to have been forgotten.

You may take two positions here. One, that says that whoever is using TOR and finding those materials are not ethically willing to report it. The other, says that whoever finds it using TOR is also not willing to report it, but not because of any ethical reason, but because they are not willing to take the risk of leaving TOR in order to report it.

Personally, I would go for the first one. Mainly, because prior to being a door of access to the DarkNet, TOR is a privacy software. That is another think for, hotlines as a whole, to start thinking about. But leave TOR for a while and let’s see it by numbers. While the reported data is the most commonly seen data on the media, the removal is but the dataset that matters.

The main target of hotlines are, or should be, those who knows where CSAM are.

It matters, specially, if you start inquiring how many files reported to CSAM hotlines are not proved as being CSAM. And from what I have read until now, I can say that this data is also to highlight. There is plenty of reasons for discussing what is happening here, but one thing is true: If you have only reached 50% of matching reports, we are working with 50% efficiency, and the reportee is a case to study and come with a proposal. And no, the solution for that is not “only report if” (or display the legal paragraph, if you want), but rather a problem of reaching the right public, which are, most of the times, still hiddens.

My ideal model of Internet Governance would be one in which you could reach CSAM clubs in their vary first stages making whatever “member to be” still not grasped by “CSAM clubs socialization” mechanic someone to work as a real gatekeeper and, report it!. If platforms are well protected against CSAM and criminals well protected against police, hotlines targets should be, exactly, those who know where the gap is.

For whoever was already trapped on the “if you report it now, you are already inside it”, hotlines should be a trusted point of reporting. For whoever disagree, and literature will point us that out, with a CSAM file whatsoever even though tolerating others, hotlines should be a trusted point of reporting. For whoever knows where those files are to be found, hotlines should be a trusted point of reporting. And those reportees should be the ones, at the end of the day, who are the most able to provide as much information of possible, and this “informational point” should also be better exploited than a “report here the URL” policy only.

This issue with criminals reporting criminal files is harder to address than it seems at first sight, and, for that reason, I won’t go deeply in that. But not to let it in the air, I will just highlight with you that the reason why CSAM networks on the DarkNet are so specialized, while hacker forums, drug markets and guns bazar usually work together is because reporting a file means, for criminals, disclosing the whole group. Was that, maybe, the reason why SilkRoad prohibited and strictly moderated its market against CSAM files? CSAM clubs also moderate their files. But that is topic for another article and there is still some empirical measurings needed there.

The day reports start to come directly from CSAM clubs, including where they are still dark, law enforcement authorities and CSAM hotlines would be made an indestructible intelligence alliance. Pay attention so, because, maybe, non-openables files, as Cyrillic reports (read it here) have a greater law-enforcement potential than they initially seem to have, considering the transitory mechanics in which they operate.

The unfiltered dataset

Meanwhile, back to the Open Web. Does an increased number of reports by the Brazilian CSAM report hotline mean that Brazilians have become more aware of it? Not necessarily. Who guarantees that people are not reporting anything whatsoever? Does it mean CSAM production is growing? Not at all, because this conclusion would need to set a filter not only for duplicates but also for files that, produced a long time ago, were simply not seen, both cases that they can, using the “known and new hashes” terminology, be classified as new material.

While this first and third data I do not have, I have the second one, the duplicates. Even though I will not do the whole mathematics here, I will show you the very first data only. from 156.692 files reported to SaferNet Brasil, 62.318 were duplicated files hosted by the United States (SaferNet Indicadores >> 2020 >> Mundo>> Duplicadas >> Por país de hospedagem). That is, approximately, 40% of already seem files being hosted in a country where CSAM reporting is mandatory. Curious data. What happened with this duplicate and how duplicates are measured: Duplicated from when, and how do you differentiate them from known hashes?

For research purposes, those are two different datasets: Duplicates should be new, still non-hashed material, where a speedy cooperation could solve their duplicated reports. The argument to the hashing case is another one: They bypassed, very probably, platforms who are known to have those very same controls. It changes everything, does not it?

Reports of CSAM have grown, but how many CSAM matching reports have you really gotten?

While the hotline received 96.590 CSAM reports in 2020 (read it here), the Hotline does not display how many CSAM files they removed in total, after having accessed it. Highlight with me, from now on, that things removed voluntarily by the platforms, and which were not assessed my hotline members cannot be part, in any case, of the removal dataset, right? Otherwise, you are poisoning the data, once we do not know if CSAM was the very reason why the platform itself or the author removed it.

But there is another data here to highlight, coming from SaferNet Brasil Helpline data: Even though it had 96.590 CSAM reports in 2020, only 17 people accessed, during the entire 2020, the Hotline chat or E-mail for purposes of CSAM (read it here). What do they ask and how could the hotline improve its own numbers? Hidden, golden compliance data. The fact that they are 17 from 96.590 indicate, very probably, that there is a matter of Trust to be analyzed here.

But see that not even the removals numbers say anything meaningful. A data such as how many CSAM were found on the platform is held not by industry, but by hotlines, and except from vary exceptional cases (SaferNet Brasil displays, for example, its top 10 removals by platform), they are not publicly available. But well, platforms are also proving their channels accordingly to their Terms of Service. And things must be clear here: What are we counting by removals?

For me, when I read removals, I read removals done by the platform itself. It can be, for sure, that hotlines have communicated that a specific file was reported and, by risk avoidance or by a real violation of their Terms of Service, the platform removed it. That is why, in terms of removal, the dataset should look, more or less, like that: Number of unique files assessed by the hotline as CSAM and removed (prior or after any real communication) by the platforms.

And when we talk about unique files, we must also take into account hash variations of the same image, what can occur for different computational reasons. Those belong, for purpose of measuring the number of files being newly produced, to another dataset.

Filtering this dataset is, as such, the first step to take CSAM research out of the data aphoria it now is. To think about it.