USP - Universidade de São Paulo

CSAM hash checkers not of individual files, but of disk images: A word on forensics.

Image Retrieved from: The Sims Catalogue

Author: Carolina Christofoletti

Link in original: Click here

Imagine you are a forensic expert working for an Internet Crimes against Children section, and you are working on a very particular case that looks like as such: You have received tips from foreign law enforcement agencies that an individual in your jurisdiction was an active member of a Child Sexual Abuse Material (CSAM) club on the Darknet. Your suspect is a key player responsible for uploading the majority of the materials that are to be found in the CSAM club, in which each ‘level’ requires a new password of access. The foreign law enforcement agency that gave you the tip had only access to 2 of, at least, 10 levels.

As a forensic expert, and considering that your suspect was operating in a place where he had, at least, a perceived high-level of security (e.g., no hash checks on Dark Web pages), you expect to find at least some (other) files of the same nature as the ones you are looking for in his personal device(s). Suppose that, passing all possible complications that could come here, you already have the device in your hands, and you are up to start extraction.

First thing to notice here is that those law enforcement operations photos, especially when we are talking about CSAM clubs (rings, networks or how you name it whatsoever) operations, where police is sitting in front of a computer with a folder full of extension collections usually does not correspond to the reality, especially if we are taking about professionalized organized crime.

As a matter of custody chain, nor police officers or forensic experts ever touch the computer before extracting the disk image. The time the photo was taken, the suspect device was already copied. Further, if we are talking about professional criminals, illegal files are not saved to the Desktop (1) and things are, most of the time, encrypted (2).

If your agency is not equipped with a powerful forensic software that is adequate to deal with those cases, you may expect to dedicate a huge time to figure out what is happening there and, maybe, some files will, even before the computer was destroyed by law enforcement personnel, remain hidden. So what do you, my forensic expert friend, do in this case?

Just as a matter of pertinent observation, I would like to dedicate four paragraphs to the encrypted files questions. In the history of CSAM clubs, histories such as Wonderland Club, whose complete “collection” was never opened because the files were encrypted is well known. From a legal point of view, I might say that the problem with the encryption there is that if criminals have the right not to incriminate itself and keys cannot be “guessed” (read broken) by the police, considering that images need (at least theoretically) to be found to base a conviction, no conviction is possible.

People who play at criminals side are intelligent, but people who play at law enforcement side are equally intelligent. From a legal point of view, hashing a CSAM file has solved a secular problem: Was the file deleted and the disk overwritten, the hash alert was already given. Mathematically, two hashes never collide. This means that, every time a hash match alert is triggered, it means that the file appeared at your working set. If the defendant wants to question its mathematics and play with collisions on complex modular hash functions, he/she will need to prove, visually, that the two files are different, and that means given the cryptography key where it exists.

So far as I know, hash technology is not embedded in hardware level, but in fact, on applications level. That means, summarizing the technical distinctions, in your Facebook interface but not on your Apple device, even though things are stored there, and law enforcement personnel have a way to get it. But, until law enforcement personnel come, criminals will silently download it. Time to think about a hardware base architecture for hashes.

Despite that, we might also consider that both the law enforcement technology (hash) and the criminal methodology (encryption) is based on the same parameter: Cryptography. The difference is that, while you “break” encryption with the proper keys and the encrypted file become accessible, you can never “break” a hash in its original value. The property is not on mathematics itself, but on where is being applied: If you encrypt the file itself or the key that gives access to it.

Even though I have already seen some discussion about hash checks (which are also used to verify integrity) on encrypted files, I would need to explore it deeper to be able to affirm something of the kind. But briefly, the problem with hash checks in encrypted file is that encryption will probably change the hash value. File formats, meanwhile, do not. The question there is, if it is possible, considering the existent encryption algorithms, to take it into account for purposes of identifying CSAM hashes that are stored in an encrypted format. The hash check should, but, run normally, are the files encrypted or not.

So back to my forensic expert. What would he do? You may agree with me that, without any proper software, he may start exploring the computer using his previous knowledge on where the things are located. You may also agree with me that this would have taken an eternity.

Imagine but if my forensic expert had, just in case, an extraction tool where hash checks could run directly on the image disk. That means that a forensic tool would run the image of the suspect computer in order to look for known files, up to now named known hashes. In our case, we are talking about known CSAM hashes. Would that have helped my forensic expert colleague? A lot!

But why? Because a hash match would have shown my forensic expert where to look. The files could have been hidden in the section that connects to the printer, to the keyboard or whatsoever. The extraction tool will find it. More than that: Once the location of a known CSAM file that the device was storing was pointed out, forensic expert would be able to go to the folder with a higher expectation to find, together with it, new, non-hashed CSAM material that the suspect was also storing. The security parameters tend to be hide it, and not than fragmentation (even because complex paths usually need to be registered somewhere). If criminals do not know what the known hashes are, they could also fragment things, but they would be doing so in the complete darkness.

Even though I am aware that hash checks on disk image software already exists, I would like to go, in this paragraph, for the security settings of it. Hash databases must remain hidden, and checks should be ONLY allowed for law enforcement personnel. The fact that it is not always like that scares me a lot. In some places, hash values are opened published, mixed with others from copyright illegal files. Open-Source Tools can be useful, but crawlers have a catastrophic potential if they are managed with hash valued in their front pages and with “e-mail alerts” where all you need if is an e-mail whatsoever. If you are a law enforcement reading that, please be aware of that.

We must, so, guarantee that, in putting a CSAM hash check forensic tool in the market, we are not allowing criminals to check, with updated data, their own files. And the solutions here are two: Either you do a professional check in your client, or you link your alert to a police agency, if possible, outside your country. As such, for every hash match, a police agency (and probably international police) should have this data, which should be accompanied by other ones, but which I will not talk about in this opportunity.

Does this solve the problem of fragmented, regional databases? Very probably, if law enforcement authorities, wherever they are located in the world, are not managing CSAM databases but receiving, in an updated manner, the new values that enter the regionals, national or any database whatsoever.

See it like an equivalent of the scenario where lots of white hackers (hotlines and national police) are daily inspecting software and communicating it to whoever is controlling what would look like, here, as an open-source code initiative. At the end of the day, the manager approves it, salts it (exactly to avoid checking’s like those above-mentioned, in case criminals communicate with each other about what files trigged the alert) and launch it as a database update. Exactly like what happens to your antivirus.

If things are like that, police task on CSAM matters would have been simplified. Known CSAM files would not be able to ‘hide’ any more. Forensic tools would be to law-enforcement-only, with a proper registry for that. Databases would be controlled by a centralized agency, which would be allowed to receive hash values from all the world.

Does it solve the problem? Solve I do not know, but if the question is if it helps the things as they are in the current state of arts, I am sure that yes.

To think about.