Business Ethics & Corporate Crime Research Universidade de São Paulo
FacebookTwitterGoogle PlusYoutube

Insight series – CSAM Research with a single forensic dataset: Day and time of file creation

Image Retrieved from: WallPaper Access

Author: Carolina Christofoletti

Link in original: Click here

I have been insisting, already for a while, in the importance of not mixing “hash-values” dataset. Differently to my previous approach dedicated to Online Social Media Platforms, I wish, in this opportunity, to apply the same “do not mix it” rationality to CSAM Law Enforcement Operations. Instead of running known hashes search at the suspects hard disks, my suggestion here is hashing the CSAM files found on the investigated forum and run it, alone, to the suspects’ computer.

We need to keep track of where things originated, where they were first seen, how those files are shared between different CSAM forums and what belongs to which place. And there is no way to follow-up with that if we do not keep track of some metrics.

I am proposing you a different methodology. Do not run the hash-search and start to look for new files. Hash the forum and run, with this specific dataset, the disks in one of the copies. In the other, you run the known hash searches because this comparative data might also be of value – especially if, in a future time, we can predict, already with a degree of certitude, where does files come possibly from.

This is but part of my “CSAM research” as “CSAM intelligence” dream. At least from what I have been reading until now, there is no such a “duplicate” research done with offenders coming from CSAM forum.

“Duplicates” are seen sometimes as “research rubbish” whose detection shall be the most automated as possible. The problem with removal-based automation is but that it loses important data for simply not collecting it at a proper time. I want but to insist on them and, more specifically, on their appearance in the Law Enforcement Operations dataset.

What if one discovers that some Child Sexual Abuse Images were seized, as duplicates, in the computer of more than one CSAM offenders in the realm of the very same Law Enforcement Operation? With this question, I start this morning article.

Researchers look for patterns on everything. And this case could be no different. Same forum, same images. Trivial. What then?

Be careful with generalizations. And to argue my point here, I will need to work with decompositions.

Same forum, same images is a precipitated conclusion. Even though we can verify that, indeed, the images are the same, we cannot deduce categorically and without further, that they are coming from the same place. Watch out everyone, because a simple “reseized” version of a CSAM forum-file may indicate different sources. Files that appear in the CSAM offender’s computer prior to its posting (if ever) on the forum also.

My dataset here consists of three pieces of non-sensitive information and the algorithm is expected to look like that (A+B): A: Select the “Forum files” dataset. Hash it (hash search A). Extract Day and Time of posting. B: Extract the hard disk image from the CSAM computer. Run the hash search A. Extract Day and Time of file creation A+B: Compare the variance of the metrics and run it again with the hard disk of another CSAM offender belonging to the same operation, until your dataset is large enough.

That CSAM offenders are very often constitutive part of various and different CSAM forums in order to “complete their collections” is a fact known to literature. The horizon here is to find a way to get to this data, at least as an indicator – at this time, but, from Law Enforcement Agencies Operation data.

We could think, as an indicator, for example on forum-new CSAM files, meaning those not found originally on the investigated forum, that were but found on the suspect computer and created (downloaded) in a day and time that, crossing with the forum activity of that same period, discards that forum as a possible source. Those are, probably, external files.

We would, for sure, need further data here to be able to affirm that this file has an external source: One must also consider the “hidden folders” hypothesis and the “removed forum file” hypothesis, where the validation will come through observation of what is to be found with other criminal peers. Is this file unique among the CSAM collection seem between specific CSAM forum members, then the external source hypothesis is a stronger one.

Sometimes, CSAM criminals are part of the very same CSAM forum, even though they do not know neither one another, nor the fact that they are interconnected. Forget not, we are talking about “nicknamed” communities. One indicator of that could be that some “private CSAM files” found at the computer of one of the suspects’ at a day X comes to be uploaded to the forum by another user at the X+10 days. We must also consider the hypothesis that the two users are, in fact, the same, what would need another variation metric.

Think with me. When we are talking about seized CSAM we are, most of the time, also talking about files whose metadata were already cleaned at some point. Is it not possible to hypothesize that, if two suspects have the same file downloaded in very close intervals of time, we already have a possible time approximation of when the file was uploaded to that common CSAM forum place?

What about manually changed metadata that, in fact, present some weird things like file creation date after to download data? When things start to collide, we start to correct, step by step, wrong data with other lateral metrics.

If we look at the images seized specifically at that forum and compare it to the hard disks seized with the criminals, could we not expect to find out, in an approximate metrics, the approximate day and time of the suspect’s entrance to that specific forum?

Forget the forum, let us look for the routes: Does it not indicate, similarly, the approximate time frame in which the found information may have forensic relevance? Where was the invitation shared, after all? How much that would have helped investigative data requests.

Could we hypothesize, ordering the creation date and time (download time) found among offenders holding the same CSAM file in ascending order, who is the “oldest” criminal of them and where the hub possibly originated, even if it has taken 3 months for this file to be uploaded somewhere? Forget not here that, sometimes, this view is a partial one – because the seizures are also partial.

The Research Insights here are many. We keep with those for a while.

Think. Carefully. About it.