PhotoDNA

(en.wikipedia.org)

24 points | by sandwichsphinx 2 days ago ago

18 comments

  • zaptrem an hour ago

    https://anishathalye.com/inverting-photodna/

    You could almost certainly produce nearly photo realistic PhotoDNA inversions with a finetuned diffusion model now. Is it possible to create a perceptual hashing algorithm where this isn't possible?

  • noduerme 2 hours ago

    What's novel about this? iirc, Apple withdrew its plan to hash photos client-side a couple years ago after an outcry. Dropbox has been hashing every file forever to save storage space. Store your shit with a cloud provider, expect it to get scanned, right?

    Also, there are a million cute methods to make two different photos produce the same hash; that was actually what the outcry about Apple's version was about. The more the hash algorithm tried to produce the same hash for different variants of a photo, the more likely it was that someone could get their hands on a flagged hash and theoretically send you an innocuous looking photo that registered as CSAM. Pretty sure that's why Apple pulled it.

    • the_mitsuhiko 25 minutes ago

      > What's novel about this?

      Literally in the first paragraph you can see that it’s not novel.

      > PhotoDNA was developed by Microsoft Research and Hany Farid, professor at Dartmouth College, beginning in 2009

    • JimDabell 44 minutes ago

      > that was actually what the outcry about Apple's version was about. The more the hash algorithm tried to produce the same hash for different variants of a photo, the more likely it was that someone could get their hands on a flagged hash and theoretically send you an innocuous looking photo that registered as CSAM.

      That was totally infeasible. There were two separate hashes, a public one and a private one, and there needed to be multiple false positives for the system to trigger. So not only would you need to generate collisions for two separate hashes simultaneously, including one for which there are no public details, you would need to do it for several images.

      People made a lot of assumptions about how it would work without actually reading the papers Apple published on how it would work. So there’s this caricature of the system in people’s minds that is a lot simpler and easier to fool than the reality. That’s what Apple was forced to react to.

    • kalleboo an hour ago

      > theoretically send you an innocuous looking photo that registered as CSAM. Pretty sure that's why Apple pulled it.

      Apple also had human reviewers in the mix. The only reason they pulled it was PR/optics.

      • jchw an hour ago

        I would assert the only reason they pursued it in the first place is PR/optics, since the "optics" of not being able to proactively police what users do using E2EE services you provide is somewhat a problem. That said, I think the concept of having your own computer covertly report you to the authorities is a level too dystopian to accept even from Apple.

      • thayne an hour ago

        I agree the reason they pulled it was probably PR/optics. But given the problems with human reviews of apps on the app store, I wouldn't be confident that an underpaid employee somewhere wouldn't blindly agree with the algorithm.

      • zaptrem an hour ago

        Wouldn't that require exfiltrating the original photo? I remember them swearing that wasn't part of the deal.

        • jchw an hour ago

          Going from memory here but IIRC the deal was that on device they'd produce a hash using a known pHash, and if that was positive, they'd send the photo to check it against a second pHash that wasn't publicly-disclosed (to try to mitigate the problem of intentional collisions) and then if both of them were positive matches, they would have human reviewers in the loop.

        • JimDabell an hour ago

          It was a lot more advanced and abuse-resistant than people assumed. I really wish people had read how it worked instead of guessing it was something a lot simpler. There were two different perceptual hashes. If both matched, and the number of positive matches was high enough, a thumbnail would be able to be decrypted by Apple. Neither the device nor the server were able to independently check for a match, so the device wasn’t able to just scan all your files and flag the matches. It was tied into the iCloud upload process.

          • zaptrem 17 minutes ago

            While this is understandable, the unfortunate issue was that Apple could be coerced into adding images certain authoritarian governments didn’t like to the list. Though imo it’s all moot if iCloud Photos aren’t end to end encrypted anyway.

  • Bayes7 2 hours ago

    Is there any good article/paper that describes how it actually works or is implemented not just in high-level and hand-waving terms?

  • ogurechny an hour ago

    Secretive, unaccountable, uncontrollably expanding, driven by shady “independent” NGO which is in fact completely in bed with certain branches of government. Russian DPI censorship system is really like that... Oh, I'm sorry, we're discussing something that is a decade older, and belongs to a “free world”, which is a completely different thing.

    These things are simply selling their services to the highest bidder. It's a business model on a power connections market. They are made to be offered to, and controlled by, entities that enjoy having such tools. Sometimes they are also offered to smaller fish, like media corporations, to hurt competitors (pirates and foreign services). Also, social media corporations can proudly state that they themselves “censor nothing”, because it's outsourced.

    There's a great portrayal of people who run such services: https://www.newyorker.com/magazine/2019/11/04/a-cybersecurit...