So if you send a picture to a Signal user, it's retrieved via cloudflare, and cached in a data center near that user; now you can look up the cache status and find the data center used. I'd say "deanonymization" is stretching it, unless the user is in the middle of nowhere (no other users near the data center). But interesting writeup anyway.
"Near a user" is also a big assumption. I'm ~200 miles to ORD and ~500 to IAD, but my ISP's peering & upstream arrangements mean Cloudflare serves my traffic 700 miles from DFW.
But, at the same time: Cloudflare isn't going to serve me a cache from Seattle, Manchester, or Tokyo. Pinning down an unknown Signal user to even a rough geographic location is an important bit of metadata that could combine to unmask an individual. Neat attack!
It's also quite insidious as you don't need to control anything on any server to get this information; as long as you can get your target to load a unique URL never before loaded by anyone else, you can simply later poll it with an unauthenticated HTTP GET from different locations, and find which one reports a Cloudflare HIT (or, even if they hid that information, finding the one that returns with lower latency).
If you're allowing user uploaded content, and you use Cloudflare as a CDN, you could mitigate and provide your users with plausible deniability by prefetching each uploaded URL from random data centers. But, of course, that's going to make your Cloudflare bill that much more expensive.
Cloudflare could allow security-sensitive clients to hide the cache-hit header and add randomized latency upon a cache hit, but the latter protection would also be expensive in how many connections must be kept alive longer than they otherwise would. Don't do anything on a personal device or account if you want your datacenter to be hidden!
Pre-fetching also becomes an issue for apps that are meant to be e2e encrypted, since it requires the server to download (read) every attachment. But if the app is already caching the attachment then they’re effectively reading it anyway.
(EDIT: Apparently signal e2e encrypts images prior to upload, so pre-fetching the encrypted blob from one or multiple servers would in fact be a mitigation of this attack.)
I do wonder if Telegram is as invulnerable as the author assumes. They might not be using Cloudflare for caching, or even HTTP, but the basic elements of this attack might still work. You’d just need to modify the “teleport” aspect of it.
Telegram doesn't use local CDNs for caching. All users are associated with one of about five telegram DCs, and upload files to their local DC. If a file was uploaded by a user on another DC, users connect to it temporarily to download the file.
The DC that a user is associated with is exposed by the API - you don't need to get them to upload a file to discover it - but it's so broad that it's not much of a deanonymizing signal. (Knowing that your target is in DC1, for example, just means that they're probably somewhere in North or South America. Or that they registered using a phone number that said they were.)
> Going forward uploaded content should never go through cloudflaire and it never really needed to.
The problem in this case isn't cloudflare. The problem is that these images load without the user's interaction and the person sending it gets to choose if it's cloudflare or not. So your statement within this context doesn't really work.
The person receiving it chooses to download images or whatever automatically though.
I dunno, I'd still say the problem is at least 50% cloudflare. Why should they make which datacenters have a resource cached be obvious public knowledge? I do agree though, one could still end up inferring this information noisily by sending an attachment, waiting a while, and then somehow querying a lot of DCs and trying to infer times to see if it's cached or not.
Personally, I've never been a fan about so many things like URLs being so public. I get the benefits of things like CDNs and what not and the odds of guessing a snowflake value and what not, but still...all attachments in Discord are public. If you have a URL, you have the attachment. And they're not the only ones with this kind of access model.
Isn’t that because the URL parameters are so long that by design they effectively _are_ the password protection for the resource ? They shouldn’t be able to ‘leak’ to unintended recipients.
Personally, like you I’m also not a huge fan of this, but URLs like that basically should be treated as the passwords. Don’t post them publicly / don’t give them out to people you don’t trust.
There's a part of me that's fine with it for a short-lived URL which contains a temporary access key but for a forever URL with a forever access key I'm not entirely happy with it.
I use it to share memes and shitpost but definitely not something to share sensitive content IMO.
For signal then the issue becomes saving who owns what image (so that you can re-issue “passwords”) and THAT is much more dangerous to the users than simply allowing users to grab semi-anonymous links into their cdn with enough of a url to be nearly impossible to iterate through every combination without hitting tons of rate limits. (Ignoring this location cache timing issue.)
Edit: Actually... (in signal's case) it might be possible to provide the user's device 2 tokens, 1 to access the url and 1 to issue new access links. Then the user can request a new access link with their second token when their url access token expires. Signatures would help prevent it from needing to be stored in the database. It would be interesting to try.
Edit2: Also I am now curious... does this mean only text messages are e2ee? yikes.
My main gripe is that if someone finds a vulnerability that gives you a list of urls the model falls apart. I’ve seen this happen in organisations :/
But agree with your statement here and others about the lifetime of the data - if something is sensitive or secret you want proper access controls applied, not just openssl rand -hex 8
I doubt how useful it would be as an attack. As a single point of info it tells you next to nothing. As part of a composition of other indicators it would be the weak link in the chain probably just causing noise for the not un-likly scenario where the person you're targeting is using a VPN.
If it was any less specific we'd be talking about a deanonymization attack that outs whether or not a target is still on Earth.
Oh, this attack would be a useful tool for e.g., identifying whistleblowers that travel a lot (e.g., in academia, military). If you know their Signal ID, you could send them images from time to time and then compare their coarse locations with travel information for a number of suspects.
I believe they'd have to accept the chat request before any images would be loaded?
Looking at the app options it seems to be possible to disable media auto-download entirely; there's tickboxes for Images/Audio/Video/Documents via Mobile Data/Wi-Fi/Roaming.
Yes, I agree. This attack won't work on competent / paranoid people. What I had in mind when writing the comment: a whistleblower who wants to inform the press about illegal practices in their company and installed Signal to communicate anonymously with journalists. Somehow, a detective working for the company got their Signal ID and contacted them, impersonating a journalist.
> not un-likly scenario where the person you're targeting is using a VPN
Do you think a large proportion of Signal users also use VPNs? I'd expect it would be a higher proportion than the general population but still only a small minority.
Being 'interesting' doesn't make you more likely to understand VPNs and opsec. I expect it makes you more likely to try, but there's a good chance of doing it ineffectively.
Note that CF will also route relative to the sites' plan. Enterprise sites are almost always routed to the closest DC, while if that DC is overloaded then lower tier websites, typically just Free sites, will get routed elsewhere (I suppose this is achieved via different anycast ranges where a specific DC is excluded). Although Discord, Signal, etc are almost certainly Enterprise sites.
Cloudflare does serve me from France. When I'm in Australia. (My ISP bought some IP addresses that were original regional France, back in the early 90s.)
So though this does have implications, the assumptions they utilise, like always, are not universal.
for "normal people", that's a pain, but with enough resources,...
Although. it has edge usecases even for "normal people":
Eg. you suspect your coworker to be catfishing you on eg. discord, you know that he's in your city now, verify, then wait for him to leave for a vacation to somewhere abroad, check again.
This is actually pretty smart, and shows that this exploit could be chained with other information to identify a specific individual. This could also be used to e.g. check which world-travelling reporter is communicating with you.
It's not an edge case. Using multiple sources of information to paint a more complete picture is the norm. That's how marketing profiles work, for example.
It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side, who additionally see a giant chunk of unencrypted traffic from the same client addresses going to other web sites. Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
CloudFlare get to see a fuckton of metadata from private and group chats, enough to trace who originally sends a piece of media (identifiable from its file size), who reads it, when it is is read, who forwards it and to whom. It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
> Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
This is an overly binary take. Security is all about threat models, and for most of us the threat model that Signal is solving is "mainstream for-profit apps snoop on the contents of my messages and use them to build an advertising profile". Most of us using it are not using Signal to skirt law enforcement, so our threat model does not include court orders and warrants.
Signal can and should append some noise to the images when encrypted (or better yet, pad them to a set file size as suggested by paulryanrogers in a sibling comment) to mitigate the risks of this attack for those who do have threat models that require it, but for the vast majority of us Signal is just as fit for purpose as we thought it was.
Maybe not individual warrants (at least not warrants to do non-scalable collections like hardware bugs in one's phone - I.e. warrants that, most users, with high probability, are not subject to). But mass surveillance, e.g. NSA, even with 'mass warrants' (e.g. Verizon-FISA warrant), that everyone is subject to, is probably in most people's attacker model. I don't have a study handy, but it seems reasonable that most users use signal to protect against mass surveillance and signal advertises itself as being good for this.
Also Marlinspike and Whittaker are quite outspoken about mass surveillance.
If cloudflare can compile a big part of the "who chats with whom" graph, that is a system design defect.
I thought it was digits only but see there's always been the option to use an alphanumeric passphrase as the "PIN". That prevents brute-forcing for anyone that bothered to use one, right?
It was only digits initially (https://old.reddit.com/r/signal/comments/oc6ow4/so_a_four_di...), with nothing preventing very easy ones like "1234", but even after they fixed it they continued to call it a PIN and many people would just assume is a number ("number" is right in the acronym), and often a very short one. Most people didn't want to set a PIN at all, they'd been being nagged about setting one and then got nagged again and again to reenter it.
It was not clear to most people that their highly sensitive info was being uploaded to the cloud at all let alone that it was only protected by the PIN. I wouldn't be surprised if a lot of people picked something as simple as possible.
Their announcement post says "at least 4 digits, but they can also be longer or alphanumeric", though maybe the feature had launched before that was written? https://signal.org/blog/signal-pins/
> Signal can and should append some noise to the images when encrypted (or better yet, pad them to a set file size as suggested by paulryanrogers in a sibling comment) to mitigate the risks of this attack for those who do have threat models that require it
Adding padding to the image wouldn't do anything to stop this "attack". This is just watching which CF datacenters cache the attachment after it gets sent.
Right, my bad on the ambiguity—I was replying to the OP's concern about image sizes, not the attack in TFA:
> It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
Hello, I'm an organizer for a system to coordinate multiple mutual aid networks, many of which are only organizing by Signal & Protonmail exclusively because they think they're secure and private.
People who are doing work to help people in ways the state tries to prevent (like giving people food) rely on this tech. These are the same groups who were able to mobilize so quickly to respond to the LA fires, but the Red Cross & police worked to shut down.
This impacts the people who are there for you when the state refuses to show up. This impacts the future version of you who needs it.
Most people aren't disabled, yet. Doesn't mean they don't need us building infrastructure for if/when they become disabled.
Someone should tell anyone who seeks confidentiality that no email is secure. Use Signal and enable the data retention (i.e., automatic message deletion) feature. By itself that is not perfectly secure, but it's a start.
The people involved are likely all using Protonmail. So that would mean TLS for the connection to Protonmail with E2EE for messages passing through Protonmail.
Not sure that encrypted email in general would be less secure than, say, Signal. Since Signal is an instant messenger on a phone it might actually be less secure[1].
I think the threat model of enough signal users to matter is nation-state actors, and signal should be secure against those actors by default so that they may hide among the entire signal user population
>It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side,
Doesn't this open up the possibility to identify groups that have been infiltrated by spies or similar posers? If you use this method to kinda-sorta locate or identify all the users in your group and one or more of those users ends up being located in a region where you should have no active group members then you may have identified a mole in your network.
Just thinking out loud here since there's no one else home.
>If you use this method to kinda-sorta locate or identify all the users in your group and one or more of those users ends up being located in a region where you should have no active group members then you may have identified a mole in your network.
...unless they happen to be using a VPN for geo-unblocking reasons or whatever.
If you're in a group like this where people are seriously concerned about their location being discovered by governments or by their own contacts, anyone in that group who is not already on a VPN all the time is either ignorant or nuts.
yeah, the person you're referring to is confused because the Cloudflare HTTP service terminates TLS and presents a Cloudflare certificate, but that doesn't have anything to do at all with Signal's E2EE which is not based on HTTPS PKI
Last time I used Cloudflare I think their settings default to only "Origin SSL/TLS" (or whatever they call it), which wouldn't encrypt anything between Cloudflare and the origin, it would only encrypt data between Cloudflare and the end-user/browser.
But the Signal client encrypts images before sending them to the Signal server. If it padded out the images at that point, the images would all be indistinguishable from each other unless Cloudflare were actually able to break the encryption (which would completely undermine the entire security model).
Ah yes, I'm sorry, I mistook the context. If Signal encrypts the images E2E, you're right that it wouldn't matter what Cloudflare does, especially if padded.
TLS doesn’t matter for End-to-end encrypted stuff though, you could exchange the data over Telnet and it would still be secure. The content itself is already encrypted before being transmitted and can only be decrypted by the receiver.
AFAIK the attack described by OP only works if the attacker knows the (randomly generated) URL of the image, which probably means they have a Signal client that can decrypt the image already. So the secrecy of the content is not at issue. The question is whether some specific person has received the same image, and from where.
Part of his attack requires disabling the cache on his (sender) side so that he doesn’t pollute the cache. That implies that both sides of the conversation share the same URL, which means Cloudflare could assume two IP addresses requesting the same URL on the Signal attachment domain are participating in a shared conversation.
Yeah, that's a problem. It is leaking metadata, not content.
Ideally, the image should be padded, encrypted with a different key, and given a different URL for each user who is authorized to view it. But this would increase the client's burden significantly, especially in conversations that include more than two people.
Say for example that you're an investigating agent in regular contact with someone.
A single data-point wouldn't mean anything. However, a sequence of daily image retrievals might tell you that they spend 90% of their time in WA and 10% of their time elsewhere.
That information alone still might not mean anything, but if you also have a specific suspect in mind, it may help confirm it. Or if you have access to the suspected person directly, if you're able to also befriend their "clean" profile, you might be able to pull the same trick and correlate the two location profiles.
De-anonymisation isn't about single pieces of information, but all information helps feed into a profile to narrow suspects or confirm suspicions.
( By "agent" I just mean a person, not an AI agent nor Law enforcement, who could presumably just get the information more directly from cloudflare. )
you don't have to "befriend" them. you send a friend request because that defaults to a push notification for users with the discord app on their phone. Now, with signal, i don't use it so i don't know how initial chats start, or whatever. The discord one is 0-click because the PFP in the friend request is the payload delivered via PUSH.
And to someone else's point - they had to block the request on their end with a MITM to do the 1-click version on signal. No such MITM is needed with the friend request.
As an aside, one time i got doxxed hard in an IRC channel with several hundred active users. I had a suspicion of who it was, and i knew they lived in chicago. So i "accidentally" sent a link to "screenshot proof" that was hosted on one of my domains. there was 1 immediate click. instant. Chicago. "accidentally" because it looked like i pasted an email body.
Packed the real screenshot and a complaint to the ircadmin. they said "and so you dox them back?"
There's probably at least a few instances where you send someone you think is American a picture but it gets cached in Moscow, or vice versa. Or you post a meme to a Californian left-wing group and it gets cached in DC. Not hard to imagine situations where getting an unexpected rough location could be a valuable signal.
>Or you post a meme to a Californian left-wing group and it gets cached in DC. Not hard to imagine situations where getting an unexpected rough location could be a valuable signal.
Not really. Any public meme group is inevitably going to be monitored by intelligence agencies, and you should assume as such. Even if it isn't, I can imagine agitators from the other side joining the group with a Russian VPN to poison the well. If there's a private group of people that you supposedly trust, any competent mole is going to be using device/network level VPN to cover their tracks. Otherwise they're 1 click away (eg. if someone shared a link) from an opsec fail.
I would bet money almost no public meme groups are monitored by any intelligence agencies. And the few that are mostly only are just in the sense of being casually co-opted by state-sponsored trolls with almost no attention from actual intelligence agency staff (in the way this thread implies, with investigations and deanonymization and such).
I'm sure they're "monitored by intelligence agencies" in the sense of having a line in a database/report somewhere (that probably no-one reads). If the technique mentioned in TFA can be used automatically (and I see no reason it shouldn't) then it will probably be incorporated in due course (if it hasn't been already) - it doesn't have to be 100% accurate, it's just one more datapoint to add to the mix.
"Deanonymization" doesn't have to refer to a full exact address. There are people who wish to conceal which country or region they live in, which this cripples.
There was a real example of that amount of information being relevant in the Silk Road investigation. Ulbricht accidentally revealed his timezone early on, which was useful to US authorities since it narrowed him down to being in the US, whereas without that information he could have been from anywhere in the world.
Anyone who wants to conceal what continent they're on will also be using a VPN 24/7, or will have the proxy setup in Signal (AKA running 24/7), which defeats this.
Yep: If your threat model includes an attack like this and you're not always on a VPN already, you're likely already compromised.
This is a neat demo, but it should not fundamentally alter the way that anyone is using Signal. Either it doesn't matter to you or you already have mitigations in place.
> If your threat model includes an attack like this
The problem is, nobody's threat model includes state level attackers, until one day it does.
Back when Ulbricht was publicly asking questions using an easily uncovered identity, he wasn't thinking that in a few years he'd have the full force of every relevant TLA in the US (and Five Eyes/14 Eyes) trying to track him down.
But he also chose to go on and found a darknet narcotics service. Most people don't do something like that.
Yes, it's vogue right now to speculate that what you're doing right now could suddenly become illegal in a new administration, but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat. For that reason, most of us won't get targeted retroactively for behaviors that were legal at the time, and we have the option to reevaluate our security posture when the political landscape changes.
But yeah, if you're actively speculating about starting an illegal service today, you should definitely have a better security posture than Ulbricht did.
> but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat. For that reason, most of us won't get targeted retroactively for behaviors that were legal at the time
I'm sure that would be part of any oppressive government's plan. They wouldn't go after people for their past "transgressions" as long as they keep their heads down, do as they're told, and don't cause any trouble. At that point you're morally compromised.
> Yes, it's vogue right now to speculate that what you're doing right now could suddenly become illegal in a new administration, but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat
I'm probably more paranoid than needed, but I'm way less sure than you seem to be about being able to hide as one of a few hundred thousand needles in the US public haystack.
I, for one, would be terrified right now if I were the child of illegal immigrants. The hateful portion of the hard right are gleefully looking forward to ICE rounding up hundreds of thousands of people.
You should probably be concerned if you were publicly pro-choice a few years back. Or if you came out as trans. Or got gay married. Or any of probably hundreds of other things that most people would have thought perfectly safe and socially reasonable in the recent past, which are looking much less so today.
When I was ~15 and this was ~2004, some friends and I ran a forum with a lot of users and did some bad things where we would track down repeat banned users and screw with them. (In our defense, they were screwing with us.)
We used everything, from browser fingerprinting (and EFF only made the world aware of it 6 years later), looking them up in databases, tracing every digital evidence they left, etc.
Every little thing counted. What I learned is that people leave a lot of traces and you can collect these traces to dox them. The way you write is even sometimes fairly identifiable.
It's not stretching it. The expectation is that Signal does not reveal any observable aspect of your IP address or location when receiving messages on it.
Whether this specific level/type of deanonymization is a problem for your particular use case is an entirely different question. Personally, I wouldn't even care if mutual contacts were to see my IP address outright (and they do for calls), but I'm not every user.
I don't care if users see "my" ipv4 because cgnat. I think i don't care if they can see my ipv6 because each machine gets a /64 to itself, that's the logic, right?
But my PBX and my matrix server both use coturn. Our 10 user "private" PBX we have to VPN into a fortigate in a DC to use, but to my understanding, there's literally no way to eavesdrop on those calls without already compromising the server it's running on, and if that's the case, no extra VPN steps or whatever will help.
anyhow even with a real, publicly routable IP, stock windows 11, stock macos (used to be true), and most linuxes won't get compromised by stuff like backorifice or whatever else l0pht put out as "remote administration tools". that is, there usually isn't any listening ports on a public IP these days. Shield's Up!
> to my understanding, there's literally no way to eavesdrop on those calls without already compromising the server it's running on
That's probably correct (with the caveat that I suspect NSA/FSB/MSS/Mossad/whoever can reasonably be assumed to have backdoored Fortinet)
There is still the problem that an attacker with "global passive observer" capabilities (which almost certainly includes most non 3rd world nation states, and probably a few of the more problematic 3rd world ones too) can still do traffic analysis to uncover your social network (or criminal/terrorist/whistleblower/journalistic network) by identifying the call traffic endpoints.
> I think i don't care if they can see my ipv6 because each machine gets a /64 to itself, that's the logic, right?
I suspect you're looking at that wrong.
It's each internet connection that gets a /64, not each machine. Your ISP hands you a /64 and you can do whatever you like with it on your home(/corporate) network.
So you can choose from 18 thousand trillion IPV6 addresses for any machine behind your ISP/internet connection, but the top half of your IPV6 address uniquely identifies that ISP and they can connect that to your account/payment details, with 4 billion times as much precision as an IPV4 address.
Exactly. Especially when considering that Signal was often advertised as that *one* privacy friendly open-source messaging solution in a world dominated by data-collecting demons like WhatsApp, etc. I don't think even WhatsApp let's such status details leak; notwithstanding whatever they might be doing with the user data on the backend.
If I know someone on Signal I can now check if they’ve left the country.
Or send this to a bunch of signal users whom you suspect one of them being a particular person, and if you know that the person you are looking for is going to travel you can send it once before and once after. Then see which of these users were in the home city and subsequently in the destination city.
Say I send a message to someone who has a phone with push notifications enabled, showing message previews. Will the phone still be connected to the VPN when it wakes up to display the message? Because my iPhone doesn't seem to stay connected to my VPN when it sleeps, at least not reliably.
There really should be a "never use the internet without VPN" mode on devices.
I don't see how that can work for the push packet itself, cause I thought that's specially handled by some low-power hardware on the phone while the main parts are shut off. Unless that hardware is also managing the VPN connection, which I doubt.
So if there's no always-on hardware maintaining that VPN connection, probably the phone is going to wake up without it. And even if it auto-reconnects, it'll probably load stuff before it's connected to the VPN.
Yeah, probably only if mobile data is turned off so the packet doesn't hit the mobile network, and only wifi calling /messaging could the VPN hide location.
The real attack is that a law enforcement agency can trivially subpoena CloudFlare with the attachment URL they will hand over the IP address of the recipient of the image along with whatever other requests they made through the CDN which can pretty precisely and rapidly de-anonymize you.
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that it's pragmatic, but feel there must be a better way.
Does the caching occur even if both users are online when the attachment is sent?
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that its pragmatic, but feel there must be a better way.
I wonder if it'd be a good idea for Signal to implement a "simple" mode that would deactivate most features in order to reduce the attack surface for people who really think they are being targeted. Would that be a good idea ?
Combined with other information, it may identify someone reliably, just like you can with zip code, age and gender. For example, if you know this person is part of a group with members in several locations, or if you can corroborate someone's movements, etc.
For example, imagine someone suspected of sharing sensitive information with a journalist. They might have a short list of suspects, and use this technique to confirm which one it is. They might identify which journalist it is - maybe only a limited number cover this beat.
That doesn't tell you whether that journalist is investigating you. Identifying them as the recipient of a Signal message from a suspect is valuable information.
Not really. It's only true if the bits are uncorrelated, and you can acquire additional bits of information. I don't see how you can go from "this guy on the internet lives near Albuquerque, New Mexico" to "this guy is Walter Hartwell White, and lives at 308 Negra Arroyo Lane, Albuquerque, New Mexico, 87104" without massive opsec failures.
If you want to extend the analogy, Gus Fring's threat model for RFP contractors at the superlab required flying people into the United States and driving them for days before reaching the final destination. i.e. If you aren't selected for the final proposal, the most you should know is the lab is "somewhere reachable by driving from the United States".
Locating the superlab to within 800 miles would break Gus' threat model.
Combined with the information the police have, which is that a new form of "blue meth" is spreading across the American southwest, a reasonable conclusion would be that the "underground superlab" is where the meth is being manufactured. It's independent corrobation of a major manufacturing operation occurring in the United States in the exact region where a new drug is taking off.
This is useful, since it helps rule out the meth being smuggled in from Mexico. It also makes the lab a high priority target, because a DEA agent investigating doesn't need to liaise with a foreign government, and you can secure a domestic prosecution + American prison time instead of attempting to extradite the cooks.
It also allows me to send a detailed memo about the superlab to ASAC Schrader's office in Albuquerque telling him about a threat in his jurisdiction, rather than circulating a brief summary about this superlab in the weekly intelligence briefing sent to all high-ranking DEA officials they probably don't read.
You can plot the timestamps of every message, read receipt and emoji reaction, which gives you the timezone and hints at work schedule, commute duration and vacations.
Often people will post photos or have profile pictures.
Say you have a photo taken at a random mcdonalds. That'd be 36'000 locations. Imagine cloudflare location and timezone help you narrow it down to new mexico. That's 80 locations. Small enough that you can look at every single one using street view and check where the photo actually was taken.
Now you can subpoena the McDonald's cctv footage and figure out who sent that picture.
You can almost certainly narrow down the McDonalds with a wide variety of things - this example is fairly contrived.
If you can see outside of the McDonalds for street view to be usable, you're almost certainly able to determine what country it is in, and potentially the exact location, depending on what is visible outside.
If it's a picture that shows the menu, well, street view isn't likely to be super useful, but you'd have a trivial time figuring out what country it is in at that point - menus vary from country to country, even when they are still in English.
New Mexico has relatively few McDonald's restaurants because New Mexico has a fairly low population - only 2.1m for the whole state. With that in mind, it seems unlikely that that Cloudflare has a close enough POP for you to be able to specifically decide it's NM.
If I can see enough for Street View to be able to confirm location, it seems like I can just search via the data there and get far more narrowed down results. If I can see a Burger King and a Best Buy outside from the picture, I can just use one of the many mapping services with APIs to get a list of all McDonalds locations within a tenth of a mile of a Burger King and Best Buy and look through a smaller list. If I'm confident of the time zone, like you suggest we should be able to be, then that's an even smaller list.
I'm not saying this attack is useless by any means, but I don't see a world where the sharing of the pictures to begin with isn't the most significant opsec failure and doesn't open you up to being de-anonymized in a myriad of other ways.
>Often people will post photos or have profile pictures.
>Say you have a photo taken at a random mcdonalds. That'd be 36'000 locations. Imagine cloudflare location and timezone help you narrow it down to new mexico. That's 80 locations. Small enough that you can look at every single one using street view and check where the photo actually was taken.
Sounds like the bigger opsec failure is posting the pictures, and the leaking the cloudflare POP only makes the search slightly easier.
Repeat the attack daily for a few weeks and you might get a pattern of movement. Of course if the target hasn’t left their general area then this won’t help. But if you’re a nation state watching a target move between multiple international locations, you could match this up with passport travel data to significantly reduce the anonymity set.
Seems contrived. What type of a person cares about deanonymization attacks and nation-states trying to find him, but doesn't have an always-on VPN? Even without this attack, not using a VPN means you're 1 wrong click/tap away (if you accidentally clicked on a link) from leaking your IP.
Right, agreed that VPN is the primary mitigation against this from a user perspective. But opsec is hard, especially when the attack can be triggered by a notification when the victim might not be expecting it and might not have VPN enabled (e.g. maybe they only enable VPN when using Discord).
(But notifications are already a bad idea for opsec anyway.)
That's why the attack is contrived. If you have poor opsec you don't need need this attack at all. You can probably get the victim's exact IP by getting him to click on a link, or sending him an email. If he has good opsec he's going to be using a VPN that renders this attack useless. For this attack to be valuable you need a guy who has such good opsec that you can't get his location any other way, but for whatever reason isn't using an always-online VPN.
Agree. Though a valid concern might be that a victim uses signal because of E2EE, thinking no 3rd party involved in delivery, not knowing/thinking about a CDN used.
Send picture to multiple accounts, perhaps on different services, the links that are cached at the same data center can be more confidently believed to be related.
For that reason that's why federated setup such as matrix are better. It is much harder to deanonymiza a set of users on different servers in group chat.
Looks like it's possible to hit 2 datacenters due to load-balancing, which would narrow it down a bit more. Suppose you do this repeatedly as the target is moving around, hitting even more datacenters.
Imagine sending a friend request to bin Laden's videographer and getting a reply from Pakistan while your entire military is looking for him in Afghanistan?
There's definitely cases where this is going to be immediately used. Shit, just using it to scrape Cloudflare for additional metadata on everyone from other user table leaks is probably valuable data. Even triangulation over time as they move around is going to get a more precise result. Maybe you find a vulnerability that takes that cloudflare node offline and run it again, repeat until you've got a fairly small radius they could be in.
This is not unique to signal. URL strings can contain identifying information regardless of where they are shared or posted. For example, if you send a link that ends with string of characters, these may correspond to a geographic location or browser settings. Blogger urls used to be geolocated, such as .ca for Canadian viewers. it is always safe to strip out unnecessary chacters if you're paranoid.
Cool writeup with some interesting techniques and approaches!
I'll echo the other comments and say "deanonymization" is stretching the definition of the word, along with "grab the user's location", as it isn't anything near precise. 150 miles is approx. a 2-hour drive on the highway from Atlanta, GA to Augusta, GA. In that radius, there's probably 700,000+ people.
I do think the auto-retrieve attachment feature of Signal is slightly concerning, as for a private messenger I'd expect there to be an option to turn it off (like turning off JS in Tor). I don't know if I'm not looking deep enough, but there doesn't seem to be a feature for that.
Signal appears to take a useful-by-default approach that balances privacy and ease-of-use in order to encourage adoption by the masses, I'd assume most people that are really concerned are hardening Signal, similar to what is in this guide: https://www.privacyguides.org/articles/2022/07/07/signal-con... . They've always recommended a VPN / proxy + a modification of settings for more high-security scenarios.
Caching isn't going anywhere, and neither is CloudFlare. The DoSing days of old in P2P multiplayer lobbies with exposed IPs seemed to carry more of a threat than this, CloudFlare's response seems to be the best out of the 3. Caching sensitive information is never recommended and the onus is on the application doing the communicating to tell their CDN / middle-service to not cache specific items.
> "deanonymization" is stretching the definition of the word, along with "grab the user's location", as it isn't anything near precise.
You'd think so, but you would be surprised how quickly this adds up to other details people share, like "oh I just drove 15 minutes to get Starbucks" or something to that effect, small things that eventually add up to a precise location over time.
Yes, but if social engineering is involved and tracing back through user conversations across a platform, it's hardly a vulnerability, let alone one deserving of a bounty. The way this is currently functioning is intended functionality, and can be further locked down depending on the user's threat model.
This can essentially be classified as opsec failure for the Signal user. If they're trying to hide from a hit in a 300 mile radius, they've got bigger problems to worry about, and should already be using a VPN setup.
Every time you click on a link your external IP addresses is exposed, is this a vulnerability? Being online without a VPN / proxy is inherent consent to have your external IP & other required items to be shared with services / middlemen.
When it comes to Discord, if you have this strict of a threat model and you're still using it, idk what to tell you.
The comment says:
Every time you click on a link your external IP addresses is exposed, is this a vulnerability? Being online without a VPN / proxy is inherent consent to have your external IP & other required items to be shared with services / middlemen.
The fact that a user's IP is exposed when they click on a link is only relevant to the original post if a user would do this automatically and without realizing. The original post alleges that they can send someone a message on Signal and have the user automatically and somewhat unknowingly load a resource from a server. Sure, the author doesn't claim they have much control over the resource or the server, but they do show how you can check which server the user accessed and how that leaks information about the location of the user to a certain extent.
Blaming the user is sometimes what it boils down to. Security includes a balancing act that involves usability, and Signal is firstly targeting the masses, but includes settings that can be configured for high-risk scenarios.
This "vulnerability" requires the user to have none of the normal things a person with a more extreme threat model would have already configured. EZPZ guides online on locking down Signal.
It's just like an iPhone. They don't ship with Lockdown Mode enabled by default, as it hurts the average consumer's usability. Signal at minimum will ensure no one is snooping on your messages, and it's up to the user whether they want to take that further.
If your definition of not providing security is allowing someone to know they exist on a continent, then that user's ISP has performed terribly as well since they aren't bouncing their signal around the world by default.
> Blaming the user is sometimes what it boils down to.
At least we agree about your argument. :)
> Signal at minimum will ensure no one is snooping on your messages, and it's up to the user whether they want to take that further.
Signal also secures metadata, including the participants in the conversation. That is undeniable - they have gone through considerable development investment to provide that feature.
> that user's ISP has performed terribly
Now we're blaming the ISP. If your app doesn't work with your users and ISPs, who does it work for? And how does a non-technical end-user know whether or when to trust you?
> When it comes to Discord, if you have this strict of a threat model and you're still using it, idk what to tell you.
I mean, you just never know... I've seen a lot of wild things, I've seen what drives people to doing crazy things. Just look up the "Deadly Runescape E Dater" who flew from the US to the UK to stab the girl he e-dated.
You can disable the auto-download. Settings > Data and storage > Media auto-download, you can choose what to auto download for mobile data/wifi/roaming.
Thank you! That's what I get for quick scrolling through the settings. I for sure thought it would have been under Privacy (for this concern), but that makes sense too.
So, just to confirm my understanding, if one goes into those settings and disables all auto-download, that helps- but, then a user will manually download images, correct? Are they still vulnerable to this issue then at that time?
A user might download images and yes, if they download images Cloudflare will show which datacenters have cached that image. They might also install an APK you give them or run that taylor_swift_concert.mp4.exe as well.
If I host an image on Cloudflare and put the URL here, I'll know which CF datacenters are near HN users who bother clicking the link as well.
Cool! Contrary to some of the other posters I think this definitely counts as deanonymization, or at least is close enough. How anonymous would satoshi be today if we had his location to within 250 miles?
Repeated applications of this attack (maybe disguised somehow?) could let you track someone’s travel over time, and it is usually only takes 4-5 zip code sized locations to uniquely identify someone.
The counter point is that anyone who cares about being anonymous is using methods to disguise their identity that cannot be compromised by this attack, e.g: a VPN. Plus, there are much more effective versions of this attack, like sending a link to an endpoint that you control -- getting someone to click a link isn't hard if you're considered trustworthy enough to send them notifications. And less technical versions, like correlating when the user is online vs. offline with timezones around the world.
The method that both Apple and Cloudflare use in their own privacy software (iCloud Private Relay for apple, WARP for Cloudflare) is specifically based on the idea that your region is not information that reveals your identity. If you enable Apple Private Relay, your origin IP will be obscured but the IP your traffic is routed through will be in the same country -- same principle.
On iCloud public relay, go to settings and select “use country and time zone” instead of “use general location.”
Now you’re no longer “within 250 miles,” hell my phone geo IPs everywhere from Louisiana to New Jersey , which are not even “in my time zone,” but there you go.
This setting was pissing meta/Facebook off big time because they also couldn’t narrow me down to a precise geographical area, resulting in much nagging and whining about “was this you signing in from [shreveport]?” and frequent account lockouts , password resets, and endless requests to approve my logins from a device that’s already logged in before I finally said to hell with it and deleted FB a few days ago.
I figure if a privacy setting makes meta mad , then it’s .. probably … a good setting. Must really irk them trying to sell location relevant ads when my state changes every other time I unlock my screen.
It’s a combined behavior of using private browsing and refusing to install their app, thereby giving them a permanent supercookie no matter what my IP is, so if you don’t like the sound of this it [might not] affect you if you use their apps. “X” does it too, just look up “inferred identity+ twitter” on google.
I’m editing out a tall claim in the last paragraph of this for some other time when I’m less tired and have sources next time we’re on the subject.
> The counter point is that anyone who cares about being anonymous is using methods to disguise their identity that cannot be compromised by this attack, e.g: a VPN.
Yes unless Apple is doing Apple things and ignores VPNs for things like push notifications…
I am not sure I understand what you mean by "trustworthy enough to send them notifications". Do you need anything other than one's phone number to send them a signal message?
The recipient would need to have this enabled, though it is by default. You can deactivate allowing others to initiate chats with you from your phone number (Settings > Privacy > Phone number)
(I don't advocate attempting to find and publish his name and address, since it'd make his life difficult, but it's still very interesting in the abstract as a curious unsolved mystery for all these years despite the number of eyes on it.)
I think the more important question is how many people in the world don't live within a 250 mile circle around New York? An investigator could potentially cut their geographical search down by 95%+.
Let's say they travel between NY and LA, how many sources of data will you need to know who was in NY on a specific date and LA on a second date? Feels like only the government can reasonably locate that.
FWIW if it's the government, wouldn't they be able to just get direct access to Cloudflare logs - in real-time even - and thus observe and track the specific incoming connection to fetch the cached image?
How many people live in a 250 mile circle around their Cloudflare POP?
Which Cloudflare POP I hit depends on which RSP I use. In the country I live in, our biggest RSP peers with Cloudflare in a neighboring country (as it is much cheaper for Cloudfare to send traffic via that RSP's peering exchange there). So something like 40% of traffic will seem to be from a entirely different country than reality.
My RSP is a small RSP which until fairly recently only had two POPs in the entire country. So regardless of where you lived, customers of my RSP would have traffic exiting onto the internet via only one of two exit points. Rural users would seem to be coming from one of the two largest cities in my country even if they are easily >250miles way from their particular POP. They do peer with Cloudflare but obviously only at the locations where they and Cloudflare are in the same city (and I'm not sure this is the case -- it is possible all national traffic to Cloudflare traffic actually goes via the one POP in our biggest city).
The only reason this attack identifies the city I happen to be in is because I live in the same city as my little's RSP's biggest POP and Cloudflare happens to peer with that RSP at that POP. Where I am is a large city so doesn't narrow things down very much -- but even worse is that whoever is looking for me would actually need to look anywhere in my country.
I don't think I am an unique case as internet routing is rarely the most direct path for various technical, financial, political, etc reasons.
De-anonymization is definitely stretching the reality of what this 'attack' is capable of IMHO.
Still quite anon. He almost certainly used a VPN, and if he didn't he likely lived in a major city which included thousands if not hundreds of thousands of capable engineers. If it said he was in SF during some messages that would tell us literally nothing.
Not sure why so many top comments dismiss the severity of this. This is just exactly the type of attack that give law enforcement or a malicious actor a way to establish proof of whereabouts.
I would guess some are just jealous of his age, but some do find the claim of de anonymizing to simply be overblown given it doesn't tell you nearly enough to find anyone except in very niche cases. This "attack" is easily defeated with a VPN or living in any major city.
You don't need to live in a major city. Cloudflare is never going to set up a caching proxy for a hamlet in the desert; you'll always be part of a huge group that a given caching proxy serves. The attacker can be happy if they can narrow the recipient's location down as much as to a single country
Posters are missing the point by projecting themselves into the scenario. Yes, it probably isn't a concern for someone living in the US or the EU. The calculus is different if you live in a smaller country, a politically sensitive area or are involved in activism against an authoritarian state.
Even for individuals in those large, developed suprastates, it opens the door for catfishing and other social engineering approaches.
Someone on GitHub called him out for making a Twitter account in 2017, since he'd have to be 8 years old at the time... I don't see what's so unbelievable about an 8yo making a Twitter account.
Interesting you touched on his age. I got extremely curious, why did the OP did such a flex?(assumming they are telling the truth). The first sentence is such a weird brag that it felt suspicious. The report is highly technical and extremely well written. We're either dealing with a pure genious or a fraud. But why would a genious flex? Doesn't make sense.
I can believe a very talented 15yo pulling this off. But the number of anonymous "I'm 15 and this is my impressive feat" posts on HN made me wonder if it's just a joke.
I don't know what you think is genius about any of this, but you're right, the flex is odd. It's something I've been seeing more and more of lately, and I find it off-putting, because, Back In My Day, I never had such a phase, where I felt like I should be given more credit for my 1337 h4xx0r skillz, because I was in high school or whatever—and I don't remember anyone else doing it, either.
I can only assume this is a consequence of modern social media having shifted the Internet from being a bunch of pseudonymous people making and sharing stuff, to everything being myopically focused on one's identity first, and what they do second (as is literally the case here).
And it looks like it works to achieve its desired effect, too—a significant portion of the comments here are congratulating the guy for doing such a thorough technical write-up, given his age. Maybe this is just me being a grumpy “old” man now, but I would've found that condescending when I was his age, and would've rather concealed my age than be condescended to as such. But, to each his own, I suppose.
I find genius being 15 and being in a state to find this issue and write this report. For my understanding there is a ton of context and knowledge packed in this write-up which i can't possibly imagine myself being able to grasp it at the age of 15. That is not to say it's not possible, but it goes to say it is very hard, that's why i characterise it as genius
I believe most people (me included) dismiss the OP's claimed severity, as if it is being oversold. I see a balance of opinions saying "great find, but not as critical as claimed" so they don't seem dismissive. It is important to correctly classify the severity of issues. Proof of whereabouts is not deanonymization, especially when the abouts are so loose
They dismiss it for the same reason people dismiss disruptive new technology - they are uncomfortable with it. It's a signal (ha) that the threat is very real.
First dismiss it and see if the problem is still there in the morning. Hope that before then, someone finds a reason it's not a problem. Anyone?
I think that's just a quirk of HackerOne's username system. The username daniel was previously owned by another account (now known as daniel-hamid) which submitted a bug to Adobe. If you go through @hackermondev's tweets (starting in 2018) they are without question a kid (making games in Roblox and Minecraft) and then started to show an interest in hacking in 2020 (which lines up with when they created their HackerOne account). The claim of being 15 years old is plausible (presumably with parents / guardians who are accomplished in technology).
Why has Signal even enabled caching for those URLs? The most common case is going to be that the attachment is downloaded once, and that's it.
I would even expect that Signal wouldn't allow you to download it more than once, and would immediately delete it after the first successful download. Well, ok, maybe the client fails mid-way through, so allow some grace period for a re-download. But I can't imagine that would be the common case either, and so disabling caching on their CDN would fix this issue, and hopefully not increase their costs much.
At any rate, "deanonymization" is a bit clickbaity here. Narrowing someone's location to within 250 miles or so isn't great, but it doesn't deanonymize them.
Edit: I didn't think about the case where an attachment is sent to a group chat, where multiple people will be downloading it. But in that case wouldn't the attachment be encrypted individually for each person in the group? I'm not sure how this works, of course.
Signal's default setup is more usability focused while supporting E2E, and less about tinfoil hat threat models about being present on a continent you're a citizen of.
The items you mentioned can essentially be configured, for those that want the insane level of privacy / security. Messages can be auto-deleted 30 seconds after being seen, a proxy can be configured to route all your traffic through it, and tons of other things can be done to customize it more to the user's liking.
I'd imagine they're caching it because of egress costs. File attachments, voice mail, video, etc. can all add up.
> Signal's default setup is more usability focused while supporting E2E
If images/attachments were e2ee, this problem probably wouldn't exist, right? or are the images on cloudflare encrypted?
Edit: I should clarify. I didn't mean the encryption itself fixes the problem, but rather that: If this were handled like the text messages we send (not via cloudflare CDNs) then this wouldn't exist. I get that attachments are quite some bytes bigger than text but shouldn't the security guarantees be the same?
I actually also wondered about this because if Signal does not encrypt attachments and delivers them via CloudFlare and that would suck as CloudFlare could just look into all them.
It seems that signal is indeed encrypting all attachments and therefore the encrypted attachments are cached and served via CloudFlare.
From what I know* (heavy on the asterisk there), they are. I'm guessing at their setup at this point, but it sounds like the "large" data is probably being stored (while encrypted) in a different way / separately than the messaging. Since it's supposedly E2E (not gonna pretend I've hand verified it), it's decrypted on the device, but it needs to be grabbed in the first place from said separate place.
So, I'm guessing the images are encrypted where they're stored. And from his post it sounds like it doesn't happen with the messages, so the motivation for using CloudFlare probably is around egress pricing, or they could be using CloudFlare R2 for storage as well.
Unless I'm missing something, this seems like an incredibly long winded way to check the users IP location?
For example, connecting to a VPN and checking https://cloudflare.com/cdn-cgi/trace
gives me `colo:CPH` (Copenhagen) which is far from my nearest CF datacenter (geographically), closer to the IP location from my VPN provider (Oslo) but still not particularly close?
If I don't use a VPN, I don't even get the capital city of my country (which I'm in right now), I get a colo approx 250 miles north. So I also dispute that Cloudflare always returns the "nearest available datacenter".
Don't get me wrong, the write up is cool and certainly interesting - just not convinced on the real world applications here...
>just not convinced on the real world applications here...
As a piece of data alone, the results are probably not of significant use.
The real-world application (and potential danger) is when this data is combined with other data. De-anonymization techniques using sparse datasets has been an active area of research for at least 15 years and it is often surprising to people how much can be gleaned from a few pieces of seemingly unconnected data.
> The real-world application (and potential danger) is when this data is combined with other data.
That's exactly the point. In this case it's only really possible to de-anonymize people who take long distance trips. But based on two data points it might be possible to know which flight or train a person travelled with.
With three different data points it might be quite unique. For example you might find out somebody travelled from Italy to Norway on Monday evening and then to France on Wednesday morning. There are probably not so many people who did a trip like that, it might come down to only one (or a handful) people who fits this itinerary. With other data sources it might be possible to uniquely identify this person.
>The real-world application (and potential danger) is when this data is combined with other data. De-anonymization techniques using sparse datasets has been an active area of research for at least 15 years and it is often surprising to people how much can be gleaned from a few pieces of seemingly unconnected data.
Seems pretty handwavy. Can you describe concretely how this would work?
Here's one of the earlier papers I remember off-hand, demonstrating one methodology.
New (and improvements to existing) statistical techniques have happened in the ~18 years since this was published. Not to mention their is significantly more data to work with now.
"We apply our de-anonymization methodology to the
Netflix Prize dataset, which contains anonymous movie
ratings of 500,000 subscribers of Netflix, the world’s
largest online movie rental service. We demonstrate
that an adversary who knows only a little bit about
an individual subscriber can easily identify this subscriber’s record in the dataset."
From the Wiki I linked:
"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."
Point being that operational security is hard, and it takes a lot less to "slip up" and accidentally reveal yourself than most people think. Obtaining a location within 250 miles (or whatever) can be a key piece of information that leads to other dots being connected.
Other examples (albeit with less explanation) include police take downs of prolific CSAM producers by gathering bits and pieces of information over time, culminating in enough to make an identification.
>"We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset."
> [...]
"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."
The only reason the two attacks work is that you have access to a bunch of uncorrelated data points. That is, ratings for various shows and their dates, and cellphone movement patterns. It's unclear how you could extend this to some guy you're trying to dox on signal. The geo info is relatively coarse and stays static, so trying to single out a single person is going to be difficult. To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much. That's going to be true for millions of people.
>To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much.
That's why I said that this data alone is probably worthless, but can gain value when combined with other data.("As a piece of data alone, the results are probably not of significant use")
The combining of data is the important bit and the entire emphasis of both of my other comments.
Two pieces of otherwise anonymous data can, when combined, lead to re-identification.
>Two pieces of otherwise anonymous data can, when combined, lead to re-identification.
How are you going to get more anonymous data? Practically speaking if your target has such poor opsec that he's hemorrhaging bits of data, you probably don't need this attack to deanonymize them.
All over the place? Your comment history here (and mine!) is full of data. Each piece alone isn't identifying, but there's a good chance that in aggregate it is.
If you share that username on discord/twitter/reddit/steam/whatever, that's even more data. If you reference old accounts anywhere, you guessed it, even more.
>you probably don't need this attack to deanonymize them
My comment wasn't necessarily specific to this attack, just noting that this attack can be an additional piece of data in the chain of re-identification.
You've gone from "not convinced on the real world applications here" to "how are you going to get more anonymous data". If we assume that you can get some data somewhere (a small list of example sources above), can we agree that there is, possibly, a real world application?
That's marginally better, but can still be a problem. Just consider e.g. a whistleblower working for a company with a very small satellite office in a given country.
Did you even read it? There's no IP leak. And if you're a high target, then using some kind of proxy is literally the first step you take. The attack is nothing but an exaggeration and has no merit in real world
Yes, I read it. Information about your IP address is leaked, as that's how Cloudflare routes you to a given datacenter.
And I strongly disagree that being able to uncover somebody's rough geographic location is not a privacy problem.
I wouldn't be surprised if this, for example, lets you deduce if somebody is currently home, at work, or commuting (as all three ISPs might be hitting different Cloudflare datacenters). That's not information everybody is comfortable broadcasting to the world.
> Privacy isn’t an optional mode — it’s just the way that Signal works. Every message, every call, every time [1]
While I don't consider this a critical bug requiring an immediate technical remediation from Signal, this should definitely be either fixed or called out in the documentation at some point.
Weather predictions are the weather channel's entire brand, but people understand the concept well enough to know that this doesn't mean it's infallible. There is a limit to how many warning stickers we need in the world. If you want to rely on a particular feature, maybe check that the product supports said feature. Signal does encryption, not onion routing
I guess it can be useful for tracking fugitive political dissidents, terrorists, etc. If you can narrow their location down to 250 miles, it's already very useful information. And without raising any suspicions.
It's not really narrowing it down to 250 miles; its narrowing it down to a circle whose radius is at least 250 miles or ~196,000mi^2.
My closest Cloudflare CDN is just listed as "DFW". The DFW metro area is about 8,700mi^2, and I imagine I could be even further than the "metro area" and still get the "DFW" Cloudflare datacenter.
In their little video animation, the area inside the overlap of those two circles encompasses several states. The edges of the two circles go from Washington to Florida and almost include Chicago. The target could have been in Denver or St Louis or Las Vegas or Phoenix or San Diego or San Francisco or Amarillo or El Paso.
If only we knew OBL's Discord handle then we would have known he was about where we figured he was all along...
And then this whole thing gets thrown off if one uses a VPN with an endpoint somewhere other than where you are. Click a button, suddenly my datacenter is AMS. Click it again, suddenly its OTP...
>If only we knew OBL's Discord handle then we would have known he was about where we figured he was all along...
Discord is just an example, this can apparently work with many apps that store user attachments on Cloudflare.
>Click a button, suddenly my datacenter is AMS. Click it again, suddenly its OTP...
Well, if the location keeps changing, it's obvious it's not their real location. But if it’s always the same, no matter what, that’s a huge clue. Of course, this works best when you’ve got some other data to back it up. It’s kind of like playing Akinator - the more answers you get, the closer you get to figuring out the target. One answer might not tell you much, but three or four?
In their example target it pinged two datacenters, one in Dallas and on in San Franciso. Their requests might bounce between datacenters even if they aren't on a VPN.
This assumes that Osama bin Laden has poor enough opsec that he's using (eg.) Discord without a proxy. State actors have much more sophisticated techniques available.
(It's still an interesting vector, though! But it's true that the headline and writeup are a bit sensationalized.)
This is certainly an "attack" but not one you'd normally associate with zero click. There is no code execution, but some tricks to see which Cloudflare datacenter cached the image -- giving a very rough area the user is in. Impressive and insightful nonetheless.
depending on the circumstance, the rough area might already be useful to adversaries of the person trying to hide. I wouldn't expect things like criminals etc. to suffer from this, 300 miles is a big radius for example... but if you want to know if 'the guy is still in country' or something like that (for instance law enforcement) it's useful for them. such parties could then collaborate with local resources to do further investigations. knowing which local resources in what area to enable might save a lot of 'costs'.
as you said, impressive and insightful. :D kinda feel like the docs on it were a bit chatGPT aided, they are super clear and full of 'certain sentences'. (this is totally an excellent use-case for that, so not bashing on it at all!).
You would know if they are over a cellular network or checking on mobile.
If someone sends you a youtube link and you hit play, YT knows who you are, both from a network perspective and potentially the logged in user.
If you are using signal in a high risk environment, you should be using it from a system that contains no extra information about you. This is the same posture one should take when using Tor.
Basic opsec.
I don't think these kinds of things are in signals threat model. It is meant? as a message platform for people with nothing to hide?
i don't think you can call opsec basic, since it requires tons of knowledge about technology and techniques adversaries might deploy against you. targets of attacks don't neccesarily have this kind of knowledge.
opsec is _incredibly_ hard for a person not deeply into technology and this type of information. you might argue that you need to stick with certain tools and techniques that are known good, but new vulnerabilities and techniques implemented against you can completely shatter previous knowledge on whats good and bad opsec and still break it despite doing it 'very well'. (like certain darknet markets being closed down due to new vulnerabilities being found in the platforms they use...)
most people who rely on opsec/tradecraft for a living, also rely on teams of people to help them maintain it and validate it constantly... (or eventually fail and get bitten).
you are right though that its unlikely a company or app producer would have a threat model tuned to people who want to hide stuff. those things generally tend to be closed down sooner or later. (encrochat and such services...)
You are absolutely right, I think it should be basic opsec, but is probably advanced opsec seeing how many folks get tripped up by this stuff.
This means, never using a browser context you have ever logged into any service that is personally identifying. That also means the order in which you load pages. If your ritual is open pintrest followed by slashdot, that is now your finger print.
It isn't just what you do, but how you do it and the ordering between those events. You also don't want to accidentally deanon yourself or your peers, even when everyone is trusted because it also leaks group membership information.
The mental framework for opsec can be modeled as vector calculus and differential geometry. You have to think of the flow of information across a surface and in the integral of that flow. Assume an adversary with perfect total information.
So many comments get caught on the wording 'deanonymization'.
Is there a standardized definition of 'deanonymization' accross industry experts, privacy-conscious people and hackers?
For many commenters, it looks like deanonymization means unveiling highly sensitive info like name, address, email, etc.
For privacy-conscious individuals and hackers, it looks like it means 'revealing a data point that shouldn't be revealed'.
As a signal or Discord user, I would expect my country location not to be revealed to a person I don't know. So the latter definition makes sense to me.
As you say, it depends on the person but I think for most people an acceptable definition is "deanonymization reveals PII". What qualifies as PII depends on the context/jurisdiction but typically an IP address would be considered PII whereas country (or a similar broad region) would not.
I'm a bit at a loss there. Has _anyone_ ever considered Signal to be anonymous? Or Discord? If so, I have bad news: they are not anonymous. At all. Not even slightly anonymous. Nor did they ever claim to be, they only claim to not be able to read your messages (Signal claims that, I don't know about Discord, I doubt it). And that claim has flaws (sure the crypto is sound but have you thoroughly reviewed and compiled the version you are using right now?)
At the very best, they are weakly pseudonymous, but that's about it. And yes, loading media by default has always been a staple of applications who prioritize their users' convenience at the expense of some security, a fine choice for the usual threat model of their users. And embedding media in messages has always been a staple of deanonymization attacks.
So ok, the tracking pixel has been shown to still be a relevant technique today, that's nice but not surprising.
If you want to remain anonymous though, don't use Discord or even Signal, and I'd advise against posting on HN either. Maybe, if you automate the pasting of messages (no js!) that has been reworded by a local llm from throwaway accounts through whonix, at random times that can't be correlated to your timezone, you _might_ have your chances. Don't bet on it.
I am currently banned from the Signal subreddit for pointing out that we only have Signal's word that they don't collect metadata. So, yeah, people do consider Signal anonymous...
This is just the fundamental way the internet works, and is the reason that anonymizing proxies like Tor exist.
If you don’t want people to be able to detect your rough geographic location, you should be using a proxy to hide it. For everybody else, knowing the edge server you are closest to is really not a threat.
No, it isn't. This is Cloudflare passing exposing metadata when it really shouldn't. Having a configuration option or a origin response header akin to CloudflareCache: private or something is trivial for them to implement.
The same information would then be available in the timing, but given the distributed nature here, that would be a lot harder to pull off.
People for whom it's a threat don't necessarily understand anonymizing proxies - very few do. Signal is supposed to provide security for those who do not.
Where does Signal claim that, or who decides what they're "supposed" to provide?
If wishes had wings, sheep would fly. People who want their computer to do a certain thing can also be expected to do a quick web search for how to make it do said thing. E.g.: hiding location? Use onion routing. Signal doesn't claim to hide your country (heck, they require your phone number!) so it seems wishful thinking to say they should have included e.g. a Tor client and enabled it by default
There's a real difference between Discord itself knowing your location and any Discord user in the world knowing it. Just like there's a difference between the VPN provider knowing your ipaddr and every website you visit knowing it.
What is the benefit of caching images in a cdn for Signal?
Assuming local client-side caching, the total number of requests for that resource should be very small, probably one in the vast majority of cases.
On an unrelated note, it seems like CloudFront could very easily fix this by not returning the cf-ray header, or at least having an option for the customer to remove it. Although, it might still be possible to get that information based on timing information...
> it seems like CloudFront could very easily fix this by not returning the cf-ray header
Then you just look at the response time. If the resource needs to be fetched from another continent, this is probably reliably measurable
Same for websites trying to hide which users exist: do a login request for an existing username and it'll do the password hashing (usually adds at least 50 ms to the response time), whereas for an invalid username it early exits. The fix is to always run the same code, so always do the hashing, which very few sites do. (Or not care about revealing this and telling people straight out that their username is unknown, if that fits with your threat model.) So to get back to Cloudflare's case: it won't help unless they delay responses, which is the opposite of what they're supposed to do
It isn't caching, it's CDNing. It is just an artefact of CDNs that they act as caches for the original content, and for improved distribution response time they cache to the nearest server from the response. ('Nearest' being an approximate heuristic, it is property of the anycast route tables in the BGP routers the request passes through, it is actually a 'best route'.)
I dont' believe the Signal app/network is choosing to cacheimages in a CDN?
But any user can send anyone other user a message that includes a link to a CDN-cached resource. Isn't that the "attack" here? Or am I misunderstanding?
Yes, Cloudflare should allow customers to disable that header, and Signal shouldn't cache images sent to a single person, or even groups of less than a few hundred people.
Looking at the locations where Cloudflare has their servers [1] in the middle of Europe. With Geneva, Zurich and Munich there is definitely the possibility that this attack on Signal will leak whether someone is at home or not.
I don't understand how Signal could dismiss this so easily. I'm starting get a bad feeling about their responses to these "low" stakes attacks. They already missed the ball on the database encryption mishap on desktop.
So, it's like the [Spectre] attack against CPUs: trigger an access from a privileged context, check if the access has filled in some cache, infer privileged information from that.
It seems that time and again, security-enforcing procedures assume that many functions they invoke are pure, but in reality these functions have side effects, and these effects are observable much easier than the security requires.
The actual problem here that the secured area is only the stuff that came through the encrypted channel. Any access beyond it, like following a link, is obviously insecure. If the link was sent via the secure channel, it becomes even less secure because it allows to observe a correlation between the secure channel (otherwise impenetrable) and the insecure outside context, and allows to blow (some of) the cover. Opening links via Tor would mitigate it a bit.
The hard truth here is that almost everything may have observable side effects, so opsec needs to permeate all aspects of life, the more cover you need, the fuller. This is mostly incompatible with a convenient UX, but, to be popular, a secure messenger has to be reasonably convenient. This necessarily limits the level of security attainable by its casual use.
This is quite a detailed write up. I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request. I’ve seen conversation requests always waiting for me to accept or not. If I don’t accept, I don’t see any messages on that chat and the other person doesn’t get any indication of message delivery. What have I missed?
If the message is from a known or trusted contact, I think there can be larger problems than just a rough location reveal.
>I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request.
Where are you getting the impression that signal auto-downloads attachments from an unknown number/contact? The OP says there's auto-download, but not that it happens from unknown contacts.
> I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request.
I guess you went through the post too quickly, because it goes over how that's exactly how it works. Unless you have push notifications enabled and on default settings to include the content in the push notification.
Congrats on finding this. Very impressive for a 15-year-old!
The section "How to Protect Yourself" is lacking.
Step 1. Don't receive this information in the push message. Only send the fact that there is something waiting for you in the app. Chances are there are other vulnerabilities that compromise the end-to-end encryption guarantees provided by the app (and only by the app).
In Signal on iOS: Click on your icon in the top left corner. Click on settings. Click notifications. Click on display below "message contents". Make your choice.
Another situation where convenience clashes with security, unfortunately.
Step 2: If you use Discord, don't allow invites from _anyone_.
Its quite bizarre why social media apps allow anonymous people to interact with you. 99% of the conversation I have is with people that I roughly know.
Discord is for gamers and quite a lot of people will be playing a game and tell someone "add me on discord my tag is xyz". Not allowing invites would seriously cut into the usability.
I'm... not actually clear on what those reasons are? For the adder, the experience is exactly the same - the only difference is that there's no longer an adder and an addee - instead there are two adders.
> Its quite bizarre why social media apps allow anonymous people to interact with you
Bit strange to attribute this to 'social media apps', isn't it? I'm interacting with an anonymous person right now. Most platforms allow it, including the older ones (i.e., IRC)
You can add them by creating a unique, temporary UUIDs/links that they can use?
You know them from somewhere else, lets say I play a game and we decided to get into a voice chat. We could create a temporary, dynamically created voice chat that we can all join (much like Google Meet) where all of us are anons.
Then, if we really want to know each other, we can then share the UUIDs.
I understand why ANYONE can send an email to me (I can decide when/will to check them)
I don't understand why ANYONE can whisper to my ears (I cant decide since they are pushed to the top of the app)
If I use Signal or Discord to send someone a link to anything hosted on a server controlled by me, provided that the user opens the link, I will get an exact IP address of the user. IP address is much more useful in de-anonymizing the user than the nearest CloudFlare datacenter location.
A fun attack, but I don't think this is a significant improvement over the existing state of the art using delivery receipt timings ("Hope of Delivery"). https://arxiv.org/pdf/2210.10523
Am I correct in surmising that someone who uses aVPN on their phone, while sending Signal messages/ content, would be cloaked, provided the VPN server they pick isn't near them ?
Usually, being identified as being part of such a huge group that there is no chance of being found is an example of anonymization, rather than deanonymization. The author might not like that there is any potential to narrow things down at all, but the information provided by this could be easily wrong if a VPN were used to have the traffic egress through a different geographic region.
Hmm "within 250 miles" is not deanonymization in my book. Unless you live in the middle of the desert. In which case there won't be a cloudflare DC near you anyway.
It's nice but at most will give you an indication of city. Perhaps together with some additional OSINT you could find the user but you'll need a lot more clues.
> When a user sends an attachment (e.g., an image) on Signal, it is uploaded to cdn2.signal.org.
Why is that even the case? I had understood that (binary) attachments are embedded into the encrypted message and hence transferred directly from sender to receiver.
Obviously, retrieving media from an external location saves bandwidth at multiple positions. I am not a security expert, but it seems almost trivial to see how storing message data on an external server conceptually facilitates attacks like this one.
Isn't that the same reason a link preview is generated at the sender first and then embedded into the message as an image?
Clever finding but the title does no justice to the actual attack. Even a bare minimum threat model requires a user to use VPN or Tor which completely eliminates your "0day". Signal rightfully declined your report because it's only job is to provide secure communication
Signal is definitely also aiming to provide metadata privacy, which they understand to be part of secure communication.
Otherwise, they wouldn't pad attachment and message sizes, offer a "sealed sender" feature, allow relaying all calls to avoid callers/callees from learning users' IP addresses etc.
While not 0-click, this might work even better using DNS and a more dense network of anycast DNS servers delegating a subdomain. Send a link to the target, and the DNS resolve should end up at your anycast DNS server. Respond with a CNAME entry, triggering a second DNS request and you can determine at which DNS server the request was served.
Would also work without anycast (and thus probably able to use a very dense botnet) and long list of NS entries for your domain.
Whether that's crazy depends on your threat model. If there's no reason, it could still be crazy in the sense of protecting from an irrational fear. If you communicate with people or organisations who shouldn't know your location, it makes sense. It depends
CDNs do not choose datacenters for users based on a geographic distance. The number one metric is latency but latency != physical distance. Second metric is optimizations of price of data transfer between peers and IXPs which results in very dynamic routing rules. Then consider also network/software hickups/maintanance and distribution of datacenters' load...
The accuracy of this geolocalization depends very much on peering agreements.
I don't know about the UD bit this will not be very accurate within the EU.
As an example:
In Hungary, there's pretty much only one peering hub (bix) and there's only one Cloudflare datacenter. You've already geolocated me better than this hack just by knowing my language or phone prefix.
When I am traveling, i most likely use my mobile data. That data is tunneled to my mobile provider, exiting to the public internet at exactly the same server.
In my case, Cloudflare will identify me as BUD even when i'm roaming at a different country.
This behavior is very typical for the EU, because the telco landscape is fairly fragmented, and each company typically have only one, or at most 2 peering locations.
This may be different within the US where the distances are bigger, and latencies matter more, so there is more incentive to peer locally.
What's old is new. Does anyone remember the forum signatures that would display the viewers IP address and location on a little wooden signpost held up by a troll-looking creature?
I was fascinated by this once I learned how it worked. At the time I was learning php and wrote a script that would draw graphics based on the requesting ip address and return as gif, then used that as my avatar on a few phpbbs. Learned a lot.
My friend would figure out the username, but he never did it maliciously, just for the challenge. Forums would show you which user was viewing a thread...
This is pretty interesting, and well documented. Great work! I wonder if there is a way to turn off notifications or if the approach is to simply not run such apps.
Not sure about mobile apps, but in Discord desktop there is an option under "settings -> notifications". Your browser may also have notification settings that would help.
This changes the attack from a 0-click attack to a 1-click attack.
It seems to me that a key requirement for this attack is that both the attacker and the victim load the same link, that is, that the attacker knows the URL the victim is going to load. If Signal/Discord created a different link to be given to the victim, and never shared it with the attacker, this attack wouldn't work.
That could be as simple as adding some extra pseudo-random parameters to the URL which will be ignored by the origin (but honored by the caches), or as complex as creating a completely separate URL for the receiver of the message, and somehow giving it to the receiver without giving it to the sender (easy on Discord, harder on Signal due to its end-to-end nature).
Since creating separate URLs would largely defeat the purpose of caching, a simpler solution would be to just disable caching, as Cloudflare suggested in their response.
I guess one possible fix would be for cloudflare to implement an option to disable the x-cache header for unauthorised users. This way Signal devs could still check their setup by sending authentication headers.
But it would solve the issue completely because you could always check the response time. Probably Signal should disable caching. I guess it's rare for someone to repeatedly download an attachment. Once it's there it's there. For grouped conversations it could be an issue though.
Not sure it's so rare. A large number of group chats will have people in the same area. For me it's the vast majority: family chat, groups of old classmates or flatmates are mostly in the same country, work chats too... I can think of one exception where a group member will consistently be hitting a different Cloudflare node from everyone else, but for everyone else, every time I send a picture into a group chat the caching will save traffic
Cool writeup by a 15yo, except for the way it completely oversells in the title.
Basically this allowed an attacker to find out which cloudflare data center a victim connected to when being tricked into loading something from cloudflare. This is often within a 250 mile radius of where they're living but not necessarily.
Can't one find out someone's IP just as easily by making them make a request to a URL controlled by an attacker? Is the problem that cloudflare is whitelisted for 0-click?
> Can't one find out someone's IP just as easily by making them make a request to a URL controlled by an attacker?
Unless you can find another flaw in Signal, that'd likely be a 1-click attack, which is less valuable than the 0-click attack demonstrated by the author.
There was mention that the Teleport tool no longer works after the bugfix of the underlying issue (calling other cf locations via Workers and an internal subnet). It seemed like the ability to query which caches HIT on the dye-test image relied on being able to call out to each other DC.
Without this control over the route (driving the probing of which caches were hit), the attack would no longer work, right?
Ah, the VPN deployment which probes from various geographies? It has limited coverage (according to author, about 54% of all Cloudflare datacenters) but still a sometimes-working attack, granted.
However, Cloudflare are known for being harsh on VPN exit points and the behavior of requesting the same (unique each pass) image from every geography and then never again, would probably look significantly suspicious, but yeah it seems not to be a priority for cloudflare at the moment.
"deanonymization" in this case is just plain wrong, you can't even tell which country the user would be in for sure. Also any proxy/vpn will completely protect against this.
It's "a very rough estimation of a user's location when they are not using a vpn".
2 questions - why do airports get cached with Cloudflare requests, and, if I use a VPN, am I getting content from my usual Cloudflare centre or the one from the country on the VPN I’m using?
The attacker uses a patched version of Signal to be able to intercept requests and to block a get request to the attachment they have just created. At least it is my understanding.
Is he just 15?
The level of technical details, and this part is not that simple:
“quickly patched the Signal desktop app to remove SSL pinning and configured Burp to intercept and view HTTP requests/responses sent through the app”
You’d be surprised at how adept the younger generation can be, especially those who’ve grown up with technology. As tech evolves, so do they. There are kids who genuinely apply themselves, and because they’ve been immersed in this environment, it’s practically second nature to them. I remember the late 1990s: I was young, but more than anything, I was curious about how things worked, I had the luxury of time, and access to technology to explore it. I started coding in C++ when I was around 13, and honestly, I still feel like I started too late.
There are also a lot more kids doing this than before. Like, I was one of 12(?) students in our high school AP Comp Sci course, then just one year after, 120 students took the same course.
Well, unlike with tracking pixels, you are not in the direct request path and cannot block it. You also have no way monitor/log if it is happening (like you can in theory with a packet capture).
It's obvious in hindsight, but I bet no one would have mentioned this possibility as why you should disable notification previews or that simply receiving a notification would possibly reveal this information.
If your target is savvy enough not to click random links sent by strangers, it's hard to get them to load it. Many apps have caught onto the tracking pixel technique. It used to work for iMessage long ago.
Looks like Cloudflare are still sending out the airport locations and hit status on the response headers. Maybe I'm missing something but it seems like if you had a large VPN network you could run a distributed query to figure out which edge nodes have cached the url.
I guess signal preview-loading or remote-image-loading features are always going to be usable to identify broadly what region a user is in, using this attack.
Can one disable those features in Signal? Would be annoying becuase they are nice, but yeah.
If you don't want that attack to be able to locate you somewhat (or at least locate your internet endpoint, if you are using a VPN or something), you will need to turn off signal previews and network image displays. Right?
Can probably achieve the same level of deanonymization by just monitoring what times the user communicates most often. Or send them enough links that they'll click on.
I think all these things are absolutely ridiculous.
I use alpine (the email client, not the Linux distro). Before that, I used pine.
Every single thing that gets loaded from anywhere on the Internet has to be the result of an action that I take. Nothing ever gets loaded automatically. I get to choose if I load the thing using the server that I'm connected to, or if I load it directly on my local machine. I know the implications of each.
The fact that programs, particularly ones that are supposed to be for the security minded like Signal, load anything by default, automatically, is just, well, naive.
I can't be the only person who thinks that people who don't think these things through shouldn't be working on apps and email clients. Sure, people would have a cow if their email client didn't load every frigging thing and run remote Javascript and so on, but in Signal? Really?
(end rant)
I see that this can be turned off. I will now tell everyone I know that uses Signal that this should, in fact, be turned off.
I can't even convince what the gouvernements are able to do. You could technically route signal over tor network but then even tor has vulnerabilities with it's C coding.
"Luckily" my ISP is DTAG which has horrible peering with Cloudflare.
So I'm routed through Warsaw (WAW) most of the time, even though there are multiple closer datacenters in Germany.
You could use this technique to see what geographic areas view what sites based on the content cache age, you would have to have the list of sites, but it would allow you to bucket a geographic by top sites from the test corpus.
Unfortunate that Cloudflare patched the issue enabling specific datacenters to be targeted. Would have been extremely useful for finding the location of servers behind Cloudflare.
I hate how teenagers can't help but post their age when it comes achievements like it makes them special. 'hai im 15 and i hack billion dollar companies in my spare time.' This is cringe AF. I don't care if they're "only a teenager." Presumably, the age was written to signal how le special they are and not liek other teenagers. So if you want special treatment learn how to be modest and don't over-exaggerate your achievements. Any adult who managed to read past this sentence is a bigger person than I am.
This doesn't strike me as a new 'attack' (I have to imagine there's even a name for such attacks), and 250 miles seems a large radius to 'deanonymize' someone, even a high-value target (even if such people didn't take any other measures to avoid being tracked...)
It's a classic timing attack. You can detect which Cloudflare datacenter is "closest" (ie. least network latency) to a targeted Signal or Discord user.
The attacker can't be forced to make a request. In this PoC the attacker disabled their own outgoing image requests.
But that wouldn't help anyway, even if the image could be cached near the sender first, or the signal server prewarmed some other cache. After the victim opened the image, the attacker would see two locations that have the image cached, and could easily deduce which one is the victim's location (e.g. if Signal pre-warmed a random cache, repeating the attack a couple of times would be enough to eliminate the randomness).
Why does CloudFlare return whether it was a cache hit or miss? This information could be hidden/removed. I understand it's not a complete solution of the issue, because cached responses will return much faster than non-cached ones, but it's a step in the right direction.
> it's possible for an attacker to run a cache geolocation attack to find out which local datacenter they're near--similar to how law enforcement track mobile devices through cell phone towers.
very much disagree on this, they track mobile devices through your connection strength to multiple cellular towers while this attack proves which singular datacenter the victim is nearest.
Don’t get me wrong the write up is really interesting but it does feel like the author is a bit of a sensationalist.
While the detection area of the cloudflare attack is bigger I think the main problem here is that its much easier to get access to it than to cellphone towers.
people learn when they’re given kind, direct, actionable feedback from people they trust - not when they’re called sensationalists by random critics on the internet.
>people learn when they’re given kind, direct, actionable feedback from people they trust - not when they’re called sensationalists by random critics on the internet.
So what are we supposed to do? Dox him, find who his friends are, and use them to backchannel feedback? I think the "sensationalist" critique is direct and actionable - just don't do it.
It would probably be better for such learning to occur in a place that doesn't create immutable records of judgments from one's peers; i.e. Hacker News comments.
"Telegram, another privacy-focused application, is completely invulnerable to this attack"
"Discord […] citing this as a Cloudflare issue other consumers are also vulnerable to"
"Cloudflare ended up completing patching the bug"
I wish Signal would react differently. I still remember the bubble color controversy when they changed their mind after the backlash and not before. :-)
> "Cloudflare ended up completing patching the bug"
This short quote fragment is a little misleading: Cloudflare patched the bug in their systems that allow you to send HTTP requests to any CF data center, regardless of where the originator of the request lives. This is likely something they want fixed for a large variety of reasons, some probably much more important than the specific attack OP wrote about.
> I wish Signal would react differently.
The severity of a potential security issue, or the determination of who is responsible for fixing or mitigating it, is a matter of opinion. Just because you think this is important for Signal to fix, it doesn't mean it's some absolute truth that it does. At the risk of appealing to authority, I would expect that people who run a security/privacy-focused messaging project to have a better handle on classifying these sorts of things than random people on HN like you or me.
But of course, sometimes they'll get it wrong too. I'm not familiar with the bubble color thing you mention, but sure, nobody's perfect; we're all human and we make mistakes. I'm personally not convinced Signal needs to do anything here. A 250 mile radius is quite a large area, and users can already choose to not auto-download attachments. To be fair, though, I think a simple way for Signal to fix this would be to disable caching on the attachments HTTP endpoints, though that might increase their bandwidth bills and increase load on their servers, depending on what their access patterns look like.
I just sent a feature request[1] to Signal with the following text:
I understand that Signal does not consider this
https://gist.github.com/hackermondev/45a3cdfa52246f1d1201c1e8cdef6117 to be
a valid security bug, but it would be helpful to at least be able to
mitigate it.
Please add an option in settings to disable automatically downloading
attachments.
That should be enough to change the attack from 0-click (just opening the
conversation) to 1-click (click the attachment). Most people won’t care
about this, but for some every little bit of privacy is important.
Hold on, someone else in this thread noted this does exist
"
You can disable the auto-download. Settings > Data and storage > Media auto-download, you can choose what to auto download for mobile data/wifi/roaming."
So, that part is there, but my question is, it's still aissue when they manually download the image, right? Unless something never accepts images from someone they aren't expecting, who 's number or unique created ID has never been seen before
> There's clearly a problem here as Cloudflare says consumers are responsible for protecting themselves against these types of attacks, while consumers (ex. Discord) are putting the blame on Cloudflare.
>I wish Signal would react differently. I still remember the bubble color controversy when they changed their mind after the backlash and not before. :-)
Can you blame them though? They're a non-profit with limited manpower and resources. There's quite a lot of cranks in the security field, and as many people have echoed in this thread, the bug report is rather sensationalist. At some point you just have to pattern match and ignore any reports that seems a bit too cranky. Is this ideal? No. But I don't see how it's any different than summarily dismissing a vaccine skeptic's claim that vaccines are bad, even if there's a kernel of truth buried in there (eg. that benefits for young people are questionable).
But calling this de-anonymization is a stretch, if it can possibly pinpoint you within 250 miles (that's assuming geoip is correct too, which it rarely is).
In their GeoGuesser demonstration video, the higlighted area is densely populated and you still would need to match millions of people vs the online user.
It does provide some hints as to the location of the targeted user, and that is cool!
De-anonymization would take monitoring over a period of time, but it could definitely work. Take this scenario for example: a person of interest is in the area of New York on Jan 1. On Jan 4 they travel to the UK. On Jan 7 they travel to Germany. On Jan 21 they travel back to the US.
The list of suspects would be fairly small when US officials cross-check individuals that travelled US-UK on Jan 4 and Germany-US on Jan 21.
So if you send a picture to a Signal user, it's retrieved via cloudflare, and cached in a data center near that user; now you can look up the cache status and find the data center used. I'd say "deanonymization" is stretching it, unless the user is in the middle of nowhere (no other users near the data center). But interesting writeup anyway.
"Near a user" is also a big assumption. I'm ~200 miles to ORD and ~500 to IAD, but my ISP's peering & upstream arrangements mean Cloudflare serves my traffic 700 miles from DFW.
But, at the same time: Cloudflare isn't going to serve me a cache from Seattle, Manchester, or Tokyo. Pinning down an unknown Signal user to even a rough geographic location is an important bit of metadata that could combine to unmask an individual. Neat attack!
It's also quite insidious as you don't need to control anything on any server to get this information; as long as you can get your target to load a unique URL never before loaded by anyone else, you can simply later poll it with an unauthenticated HTTP GET from different locations, and find which one reports a Cloudflare HIT (or, even if they hid that information, finding the one that returns with lower latency).
If you're allowing user uploaded content, and you use Cloudflare as a CDN, you could mitigate and provide your users with plausible deniability by prefetching each uploaded URL from random data centers. But, of course, that's going to make your Cloudflare bill that much more expensive.
Cloudflare could allow security-sensitive clients to hide the cache-hit header and add randomized latency upon a cache hit, but the latter protection would also be expensive in how many connections must be kept alive longer than they otherwise would. Don't do anything on a personal device or account if you want your datacenter to be hidden!
Pre-fetching also becomes an issue for apps that are meant to be e2e encrypted, since it requires the server to download (read) every attachment. But if the app is already caching the attachment then they’re effectively reading it anyway.
(EDIT: Apparently signal e2e encrypts images prior to upload, so pre-fetching the encrypted blob from one or multiple servers would in fact be a mitigation of this attack.)
I do wonder if Telegram is as invulnerable as the author assumes. They might not be using Cloudflare for caching, or even HTTP, but the basic elements of this attack might still work. You’d just need to modify the “teleport” aspect of it.
Telegram doesn't use local CDNs for caching. All users are associated with one of about five telegram DCs, and upload files to their local DC. If a file was uploaded by a user on another DC, users connect to it temporarily to download the file.
The DC that a user is associated with is exposed by the API - you don't need to get them to upload a file to discover it - but it's so broad that it's not much of a deanonymizing signal. (Knowing that your target is in DC1, for example, just means that they're probably somewhere in North or South America. Or that they registered using a phone number that said they were.)
https://core.telegram.org/cdn
Going forward uploaded content should never go through cloudflaire and it never really needed to.
Add unique urls.
Maybe just avoid it altogether.
> Going forward uploaded content should never go through cloudflaire and it never really needed to.
The problem in this case isn't cloudflare. The problem is that these images load without the user's interaction and the person sending it gets to choose if it's cloudflare or not. So your statement within this context doesn't really work.
The person receiving it chooses to download images or whatever automatically though.
I dunno, I'd still say the problem is at least 50% cloudflare. Why should they make which datacenters have a resource cached be obvious public knowledge? I do agree though, one could still end up inferring this information noisily by sending an attachment, waiting a while, and then somehow querying a lot of DCs and trying to infer times to see if it's cached or not.
Personally, I've never been a fan about so many things like URLs being so public. I get the benefits of things like CDNs and what not and the odds of guessing a snowflake value and what not, but still...all attachments in Discord are public. If you have a URL, you have the attachment. And they're not the only ones with this kind of access model.
Isn’t that because the URL parameters are so long that by design they effectively _are_ the password protection for the resource ? They shouldn’t be able to ‘leak’ to unintended recipients.
Personally, like you I’m also not a huge fan of this, but URLs like that basically should be treated as the passwords. Don’t post them publicly / don’t give them out to people you don’t trust.
There's a part of me that's fine with it for a short-lived URL which contains a temporary access key but for a forever URL with a forever access key I'm not entirely happy with it.
I use it to share memes and shitpost but definitely not something to share sensitive content IMO.
Discord doesn't do forever URLs for attachments any more, they changed that a while back.[0]
The problem here is avatar URLs.
[0] https://www.bleepingcomputer.com/news/security/discord-will-...
For signal then the issue becomes saving who owns what image (so that you can re-issue “passwords”) and THAT is much more dangerous to the users than simply allowing users to grab semi-anonymous links into their cdn with enough of a url to be nearly impossible to iterate through every combination without hitting tons of rate limits. (Ignoring this location cache timing issue.)
Edit: Actually... (in signal's case) it might be possible to provide the user's device 2 tokens, 1 to access the url and 1 to issue new access links. Then the user can request a new access link with their second token when their url access token expires. Signatures would help prevent it from needing to be stored in the database. It would be interesting to try.
Edit2: Also I am now curious... does this mean only text messages are e2ee? yikes.
My main gripe is that if someone finds a vulnerability that gives you a list of urls the model falls apart. I’ve seen this happen in organisations :/
But agree with your statement here and others about the lifetime of the data - if something is sensitive or secret you want proper access controls applied, not just openssl rand -hex 8
> Why should they make which datacenters have a resource cached be obvious public knowledge?
I agree that having it in the header for everyone is maybe too obvious. But you could otherwise infer that from timing.
Would removing cloudflare fix the issue? Then the problem is cloudflare related.
Your defense doesn't really work. Sure many entities could share blame but the one fix is getting rid of cloudflare.
I doubt how useful it would be as an attack. As a single point of info it tells you next to nothing. As part of a composition of other indicators it would be the weak link in the chain probably just causing noise for the not un-likly scenario where the person you're targeting is using a VPN.
If it was any less specific we'd be talking about a deanonymization attack that outs whether or not a target is still on Earth.
Oh, this attack would be a useful tool for e.g., identifying whistleblowers that travel a lot (e.g., in academia, military). If you know their Signal ID, you could send them images from time to time and then compare their coarse locations with travel information for a number of suspects.
I believe they'd have to accept the chat request before any images would be loaded?
Looking at the app options it seems to be possible to disable media auto-download entirely; there's tickboxes for Images/Audio/Video/Documents via Mobile Data/Wi-Fi/Roaming.
Yes, I agree. This attack won't work on competent / paranoid people. What I had in mind when writing the comment: a whistleblower who wants to inform the press about illegal practices in their company and installed Signal to communicate anonymously with journalists. Somehow, a detective working for the company got their Signal ID and contacted them, impersonating a journalist.
> not un-likly scenario where the person you're targeting is using a VPN
Do you think a large proportion of Signal users also use VPNs? I'd expect it would be a higher proportion than the general population but still only a small minority.
> Do you think a large proportion of Signal users also use VPNs?
It is feasible to consider that interesting Signal users mostly use VPN as an extra protection layer.
Being 'interesting' doesn't make you more likely to understand VPNs and opsec. I expect it makes you more likely to try, but there's a good chance of doing it ineffectively.
Note that CF will also route relative to the sites' plan. Enterprise sites are almost always routed to the closest DC, while if that DC is overloaded then lower tier websites, typically just Free sites, will get routed elsewhere (I suppose this is achieved via different anycast ranges where a specific DC is excluded). Although Discord, Signal, etc are almost certainly Enterprise sites.
I have this old site to test this (the list of sites is a bit old): https://cloudflare-test.judge.sh/
WTF? the trace endpoint allows CORS from any origin?!? Why?!
Cloudflare does serve me from France. When I'm in Australia. (My ISP bought some IP addresses that were original regional France, back in the early 90s.)
So though this does have implications, the assumptions they utilise, like always, are not universal.
> My ISP bought some IP addresses that were original regional France
CLoudflare uses anycast, and IP geo location is not how anycast works.
That may be true. But you still need to explain why Cloudflare serves me from France, and not Sydney, in that case.
Wow doesn't that make things really slow due to the RTT of the acknowledgements?
Australia. Our fastest networks are pathetically slow.
The L2 FTTN parts of the NBN have been known to have an RTT in the range of minutes, for some locations.
My own varies from 5ms, for those who don't assume my geography, out to 890ms for those that do.
for "normal people", that's a pain, but with enough resources,...
Although. it has edge usecases even for "normal people":
Eg. you suspect your coworker to be catfishing you on eg. discord, you know that he's in your city now, verify, then wait for him to leave for a vacation to somewhere abroad, check again.
This is actually pretty smart, and shows that this exploit could be chained with other information to identify a specific individual. This could also be used to e.g. check which world-travelling reporter is communicating with you.
It's not an edge case. Using multiple sources of information to paint a more complete picture is the norm. That's how marketing profiles work, for example.
It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side, who additionally see a giant chunk of unencrypted traffic from the same client addresses going to other web sites. Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
CloudFlare get to see a fuckton of metadata from private and group chats, enough to trace who originally sends a piece of media (identifiable from its file size), who reads it, when it is is read, who forwards it and to whom. It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
> Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
This is an overly binary take. Security is all about threat models, and for most of us the threat model that Signal is solving is "mainstream for-profit apps snoop on the contents of my messages and use them to build an advertising profile". Most of us using it are not using Signal to skirt law enforcement, so our threat model does not include court orders and warrants.
Signal can and should append some noise to the images when encrypted (or better yet, pad them to a set file size as suggested by paulryanrogers in a sibling comment) to mitigate the risks of this attack for those who do have threat models that require it, but for the vast majority of us Signal is just as fit for purpose as we thought it was.
Maybe not individual warrants (at least not warrants to do non-scalable collections like hardware bugs in one's phone - I.e. warrants that, most users, with high probability, are not subject to). But mass surveillance, e.g. NSA, even with 'mass warrants' (e.g. Verizon-FISA warrant), that everyone is subject to, is probably in most people's attacker model. I don't have a study handy, but it seems reasonable that most users use signal to protect against mass surveillance and signal advertises itself as being good for this.
Also Marlinspike and Whittaker are quite outspoken about mass surveillance.
If cloudflare can compile a big part of the "who chats with whom" graph, that is a system design defect.
I highly doubt that signal does anything to help with mass surveillance. Signal started keeping people's name, photo, phone number, and contacts in the cloud protected by a "secure" enclave the NSA almost certainly has access to and hackers already got into (https://community.signalusers.org/t/sgx-cacheout-sgaxe-attac...) and even leaving all that aside, all anyone needs is a PIN that can be trivially brute forced. (https://www.vice.com/en/article/signal-new-pin-feature-worri...)
I thought it was digits only but see there's always been the option to use an alphanumeric passphrase as the "PIN". That prevents brute-forcing for anyone that bothered to use one, right?
It was only digits initially (https://old.reddit.com/r/signal/comments/oc6ow4/so_a_four_di...), with nothing preventing very easy ones like "1234", but even after they fixed it they continued to call it a PIN and many people would just assume is a number ("number" is right in the acronym), and often a very short one. Most people didn't want to set a PIN at all, they'd been being nagged about setting one and then got nagged again and again to reenter it.
It was not clear to most people that their highly sensitive info was being uploaded to the cloud at all let alone that it was only protected by the PIN. I wouldn't be surprised if a lot of people picked something as simple as possible.
https://old.reddit.com/r/signal/comments/gqc2hu/the_new_pin_...
Their announcement post says "at least 4 digits, but they can also be longer or alphanumeric", though maybe the feature had launched before that was written? https://signal.org/blog/signal-pins/
Far from ideal I agree.
> Signal can and should append some noise to the images when encrypted (or better yet, pad them to a set file size as suggested by paulryanrogers in a sibling comment) to mitigate the risks of this attack for those who do have threat models that require it
Adding padding to the image wouldn't do anything to stop this "attack". This is just watching which CF datacenters cache the attachment after it gets sent.
Right, my bad on the ambiguity—I was replying to the OP's concern about image sizes, not the attack in TFA:
> It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
That makes sense. Thanks for the clarification, my bad!
Hello, I'm an organizer for a system to coordinate multiple mutual aid networks, many of which are only organizing by Signal & Protonmail exclusively because they think they're secure and private.
People who are doing work to help people in ways the state tries to prevent (like giving people food) rely on this tech. These are the same groups who were able to mobilize so quickly to respond to the LA fires, but the Red Cross & police worked to shut down.
This impacts the people who are there for you when the state refuses to show up. This impacts the future version of you who needs it.
Most people aren't disabled, yet. Doesn't mean they don't need us building infrastructure for if/when they become disabled.
What groups did the police and Red Cross shut down? Any links?
In any geopolitical crisis, you tend to have victims on both sides be prevented from getting relief, except when the one side is imperial.
The powerful entities tend to prohibit relief to the oppressed side, even making it illegal.
Someone should tell anyone who seeks confidentiality that no email is secure. Use Signal and enable the data retention (i.e., automatic message deletion) feature. By itself that is not perfectly secure, but it's a start.
The people involved are likely all using Protonmail. So that would mean TLS for the connection to Protonmail with E2EE for messages passing through Protonmail.
Not sure that encrypted email in general would be less secure than, say, Signal. Since Signal is an instant messenger on a phone it might actually be less secure[1].
[1] https://articles.59.ca/doku.php?id=em:emailvsim
This is why I say that it's overly binary, not incorrect. Some people do have such needs, and Signal can and should fix this for those people.
people who think protonmail is secure it's to the same level as mail.yahoo.com :)
I think the threat model of enough signal users to matter is nation-state actors, and signal should be secure against those actors by default so that they may hide among the entire signal user population
>It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side,
Doesn't this open up the possibility to identify groups that have been infiltrated by spies or similar posers? If you use this method to kinda-sorta locate or identify all the users in your group and one or more of those users ends up being located in a region where you should have no active group members then you may have identified a mole in your network.
Just thinking out loud here since there's no one else home.
>If you use this method to kinda-sorta locate or identify all the users in your group and one or more of those users ends up being located in a region where you should have no active group members then you may have identified a mole in your network.
...unless they happen to be using a VPN for geo-unblocking reasons or whatever.
If you're in a group like this where people are seriously concerned about their location being discovered by governments or by their own contacts, anyone in that group who is not already on a VPN all the time is either ignorant or nuts.
Communication of any sort over any channel risks sharing location information. Silence is secrecy.
I wonder if we'll see assets being padded to some common byte sizes to combat this.
Hi there, Signal dev here. We do, in fact, pad attachments to a limited set of bucket sizes.
Nothing stops Cloudflare from inspecting the file contents, or using a hash to distinguish between identically-sized files.
The only reason we assume they don't do this is because it's a waste of resources for no good reason. But what if somebody gave them a good reason?
Aren’t the files end-to-end encrypted? How would they inspect the files?
yeah, the person you're referring to is confused because the Cloudflare HTTP service terminates TLS and presents a Cloudflare certificate, but that doesn't have anything to do at all with Signal's E2EE which is not based on HTTPS PKI
Last time I used Cloudflare I think their settings default to only "Origin SSL/TLS" (or whatever they call it), which wouldn't encrypt anything between Cloudflare and the origin, it would only encrypt data between Cloudflare and the end-user/browser.
But the Signal client encrypts images before sending them to the Signal server. If it padded out the images at that point, the images would all be indistinguishable from each other unless Cloudflare were actually able to break the encryption (which would completely undermine the entire security model).
So the image is uploaded for each recipient with an individual key?
Ah yes, I'm sorry, I mistook the context. If Signal encrypts the images E2E, you're right that it wouldn't matter what Cloudflare does, especially if padded.
TLS doesn’t matter for End-to-end encrypted stuff though, you could exchange the data over Telnet and it would still be secure. The content itself is already encrypted before being transmitted and can only be decrypted by the receiver.
AFAIK the attack described by OP only works if the attacker knows the (randomly generated) URL of the image, which probably means they have a Signal client that can decrypt the image already. So the secrecy of the content is not at issue. The question is whether some specific person has received the same image, and from where.
Part of his attack requires disabling the cache on his (sender) side so that he doesn’t pollute the cache. That implies that both sides of the conversation share the same URL, which means Cloudflare could assume two IP addresses requesting the same URL on the Signal attachment domain are participating in a shared conversation.
Yeah, that's a problem. It is leaking metadata, not content.
Ideally, the image should be padded, encrypted with a different key, and given a different URL for each user who is authorized to view it. But this would increase the client's burden significantly, especially in conversations that include more than two people.
> , it is astonishing the words "secure" and "Signal" ever appear in the same sentence.
You misspelled "I do not understand what end to end encryption means"
It could be useful for correlation.
Say for example that you're an investigating agent in regular contact with someone.
A single data-point wouldn't mean anything. However, a sequence of daily image retrievals might tell you that they spend 90% of their time in WA and 10% of their time elsewhere.
That information alone still might not mean anything, but if you also have a specific suspect in mind, it may help confirm it. Or if you have access to the suspected person directly, if you're able to also befriend their "clean" profile, you might be able to pull the same trick and correlate the two location profiles.
De-anonymisation isn't about single pieces of information, but all information helps feed into a profile to narrow suspects or confirm suspicions.
( By "agent" I just mean a person, not an AI agent nor Law enforcement, who could presumably just get the information more directly from cloudflare. )
you don't have to "befriend" them. you send a friend request because that defaults to a push notification for users with the discord app on their phone. Now, with signal, i don't use it so i don't know how initial chats start, or whatever. The discord one is 0-click because the PFP in the friend request is the payload delivered via PUSH.
And to someone else's point - they had to block the request on their end with a MITM to do the 1-click version on signal. No such MITM is needed with the friend request.
As an aside, one time i got doxxed hard in an IRC channel with several hundred active users. I had a suspicion of who it was, and i knew they lived in chicago. So i "accidentally" sent a link to "screenshot proof" that was hosted on one of my domains. there was 1 immediate click. instant. Chicago. "accidentally" because it looked like i pasted an email body.
Packed the real screenshot and a complaint to the ircadmin. they said "and so you dox them back?"
can't win for trying.
There's probably at least a few instances where you send someone you think is American a picture but it gets cached in Moscow, or vice versa. Or you post a meme to a Californian left-wing group and it gets cached in DC. Not hard to imagine situations where getting an unexpected rough location could be a valuable signal.
>Or you post a meme to a Californian left-wing group and it gets cached in DC. Not hard to imagine situations where getting an unexpected rough location could be a valuable signal.
Not really. Any public meme group is inevitably going to be monitored by intelligence agencies, and you should assume as such. Even if it isn't, I can imagine agitators from the other side joining the group with a Russian VPN to poison the well. If there's a private group of people that you supposedly trust, any competent mole is going to be using device/network level VPN to cover their tracks. Otherwise they're 1 click away (eg. if someone shared a link) from an opsec fail.
I would bet money almost no public meme groups are monitored by any intelligence agencies. And the few that are mostly only are just in the sense of being casually co-opted by state-sponsored trolls with almost no attention from actual intelligence agency staff (in the way this thread implies, with investigations and deanonymization and such).
I'm sure they're "monitored by intelligence agencies" in the sense of having a line in a database/report somewhere (that probably no-one reads). If the technique mentioned in TFA can be used automatically (and I see no reason it shouldn't) then it will probably be incorporated in due course (if it hasn't been already) - it doesn't have to be 100% accurate, it's just one more datapoint to add to the mix.
You can also ping the same person multiple times, like once a day at different time of the day. That provides a more complete range.
"Deanonymization" doesn't have to refer to a full exact address. There are people who wish to conceal which country or region they live in, which this cripples.
There was a real example of that amount of information being relevant in the Silk Road investigation. Ulbricht accidentally revealed his timezone early on, which was useful to US authorities since it narrowed him down to being in the US, whereas without that information he could have been from anywhere in the world.
Not really.
Anyone who wants to conceal what continent they're on will also be using a VPN 24/7, or will have the proxy setup in Signal (AKA running 24/7), which defeats this.
Yep: If your threat model includes an attack like this and you're not always on a VPN already, you're likely already compromised.
This is a neat demo, but it should not fundamentally alter the way that anyone is using Signal. Either it doesn't matter to you or you already have mitigations in place.
> If your threat model includes an attack like this
The problem is, nobody's threat model includes state level attackers, until one day it does.
Back when Ulbricht was publicly asking questions using an easily uncovered identity, he wasn't thinking that in a few years he'd have the full force of every relevant TLA in the US (and Five Eyes/14 Eyes) trying to track him down.
But he also chose to go on and found a darknet narcotics service. Most people don't do something like that.
Yes, it's vogue right now to speculate that what you're doing right now could suddenly become illegal in a new administration, but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat. For that reason, most of us won't get targeted retroactively for behaviors that were legal at the time, and we have the option to reevaluate our security posture when the political landscape changes.
But yeah, if you're actively speculating about starting an illegal service today, you should definitely have a better security posture than Ulbricht did.
> but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat. For that reason, most of us won't get targeted retroactively for behaviors that were legal at the time
I'm sure that would be part of any oppressive government's plan. They wouldn't go after people for their past "transgressions" as long as they keep their heads down, do as they're told, and don't cause any trouble. At that point you're morally compromised.
> Yes, it's vogue right now to speculate that what you're doing right now could suddenly become illegal in a new administration, but if that happens tomorrow, most of us would be one of hundreds of thousands who are all in the same boat
I'm probably more paranoid than needed, but I'm way less sure than you seem to be about being able to hide as one of a few hundred thousand needles in the US public haystack.
I, for one, would be terrified right now if I were the child of illegal immigrants. The hateful portion of the hard right are gleefully looking forward to ICE rounding up hundreds of thousands of people.
You should probably be concerned if you were publicly pro-choice a few years back. Or if you came out as trans. Or got gay married. Or any of probably hundreds of other things that most people would have thought perfectly safe and socially reasonable in the recent past, which are looking much less so today.
When I was ~15 and this was ~2004, some friends and I ran a forum with a lot of users and did some bad things where we would track down repeat banned users and screw with them. (In our defense, they were screwing with us.)
We used everything, from browser fingerprinting (and EFF only made the world aware of it 6 years later), looking them up in databases, tracing every digital evidence they left, etc.
Every little thing counted. What I learned is that people leave a lot of traces and you can collect these traces to dox them. The way you write is even sometimes fairly identifiable.
It's not stretching it. The expectation is that Signal does not reveal any observable aspect of your IP address or location when receiving messages on it.
Whether this specific level/type of deanonymization is a problem for your particular use case is an entirely different question. Personally, I wouldn't even care if mutual contacts were to see my IP address outright (and they do for calls), but I'm not every user.
I don't care if users see "my" ipv4 because cgnat. I think i don't care if they can see my ipv6 because each machine gets a /64 to itself, that's the logic, right?
But my PBX and my matrix server both use coturn. Our 10 user "private" PBX we have to VPN into a fortigate in a DC to use, but to my understanding, there's literally no way to eavesdrop on those calls without already compromising the server it's running on, and if that's the case, no extra VPN steps or whatever will help.
anyhow even with a real, publicly routable IP, stock windows 11, stock macos (used to be true), and most linuxes won't get compromised by stuff like backorifice or whatever else l0pht put out as "remote administration tools". that is, there usually isn't any listening ports on a public IP these days. Shield's Up!
> to my understanding, there's literally no way to eavesdrop on those calls without already compromising the server it's running on
That's probably correct (with the caveat that I suspect NSA/FSB/MSS/Mossad/whoever can reasonably be assumed to have backdoored Fortinet)
There is still the problem that an attacker with "global passive observer" capabilities (which almost certainly includes most non 3rd world nation states, and probably a few of the more problematic 3rd world ones too) can still do traffic analysis to uncover your social network (or criminal/terrorist/whistleblower/journalistic network) by identifying the call traffic endpoints.
> I think i don't care if they can see my ipv6 because each machine gets a /64 to itself, that's the logic, right?
I suspect you're looking at that wrong.
It's each internet connection that gets a /64, not each machine. Your ISP hands you a /64 and you can do whatever you like with it on your home(/corporate) network.
So you can choose from 18 thousand trillion IPV6 addresses for any machine behind your ISP/internet connection, but the top half of your IPV6 address uniquely identifies that ISP and they can connect that to your account/payment details, with 4 billion times as much precision as an IPV4 address.
Exactly. Especially when considering that Signal was often advertised as that *one* privacy friendly open-source messaging solution in a world dominated by data-collecting demons like WhatsApp, etc. I don't think even WhatsApp let's such status details leak; notwithstanding whatever they might be doing with the user data on the backend.
I can send a link in Whatsapp to a domain I control and track if clicked. How is that different?
If I know someone on Signal I can now check if they’ve left the country.
Or send this to a bunch of signal users whom you suspect one of them being a particular person, and if you know that the person you are looking for is going to travel you can send it once before and once after. Then see which of these users were in the home city and subsequently in the destination city.
A VPN obfuscates this. Assuming a target is even remotely aware, you might think they are in Australia, while they're actually in Nova Scotia
Say I send a message to someone who has a phone with push notifications enabled, showing message previews. Will the phone still be connected to the VPN when it wakes up to display the message? Because my iPhone doesn't seem to stay connected to my VPN when it sleeps, at least not reliably.
There really should be a "never use the internet without VPN" mode on devices.
Valid point. Afaict, vpns I've used route all network activity regardless of phone state, but that's likely dependant on the service.
I don't see how that can work for the push packet itself, cause I thought that's specially handled by some low-power hardware on the phone while the main parts are shut off. Unless that hardware is also managing the VPN connection, which I doubt.
So if there's no always-on hardware maintaining that VPN connection, probably the phone is going to wake up without it. And even if it auto-reconnects, it'll probably load stuff before it's connected to the VPN.
Yeah, probably only if mobile data is turned off so the packet doesn't hit the mobile network, and only wifi calling /messaging could the VPN hide location.
The real attack is that a law enforcement agency can trivially subpoena CloudFlare with the attachment URL they will hand over the IP address of the recipient of the image along with whatever other requests they made through the CDN which can pretty precisely and rapidly de-anonymize you.
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that it's pragmatic, but feel there must be a better way.
Does the caching occur even if both users are online when the attachment is sent?
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that its pragmatic, but feel there must be a better way.
Indeed, "incredibly precise estimate of the user's location" feels like an exaggeration. But still, very interesting!
I'd say it'd be useful for very specific use cases. Such as finding out what country Jia Tan, the XZ Utils backdoor attacker, is in.
I wonder if it'd be a good idea for Signal to implement a "simple" mode that would deactivate most features in order to reduce the attack surface for people who really think they are being targeted. Would that be a good idea ?
Why does it need to be cached though?
The only case where it might be downloaded more than once is if the user has multiple clients. Not that common and still very little traffic.
Combined with other information, it may identify someone reliably, just like you can with zip code, age and gender. For example, if you know this person is part of a group with members in several locations, or if you can corroborate someone's movements, etc.
For example, imagine someone suspected of sharing sensitive information with a journalist. They might have a short list of suspects, and use this technique to confirm which one it is. They might identify which journalist it is - maybe only a limited number cover this beat.
Or you want to find a specific journalist, and you find out that they just arrived to a certain city, and there are only three hotels in that city...
That doesn't tell you whether that journalist is investigating you. Identifying them as the recipient of a Signal message from a suspect is valuable information.
It only takes 33 bits to identify someone. This reveals a couple of bits.
Not really. It's only true if the bits are uncorrelated, and you can acquire additional bits of information. I don't see how you can go from "this guy on the internet lives near Albuquerque, New Mexico" to "this guy is Walter Hartwell White, and lives at 308 Negra Arroyo Lane, Albuquerque, New Mexico, 87104" without massive opsec failures.
If you want to extend the analogy, Gus Fring's threat model for RFP contractors at the superlab required flying people into the United States and driving them for days before reaching the final destination. i.e. If you aren't selected for the final proposal, the most you should know is the lab is "somewhere reachable by driving from the United States".
Locating the superlab to within 800 miles would break Gus' threat model.
Combined with the information the police have, which is that a new form of "blue meth" is spreading across the American southwest, a reasonable conclusion would be that the "underground superlab" is where the meth is being manufactured. It's independent corrobation of a major manufacturing operation occurring in the United States in the exact region where a new drug is taking off.
This is useful, since it helps rule out the meth being smuggled in from Mexico. It also makes the lab a high priority target, because a DEA agent investigating doesn't need to liaise with a foreign government, and you can secure a domestic prosecution + American prison time instead of attempting to extradite the cooks.
It also allows me to send a detailed memo about the superlab to ASAC Schrader's office in Albuquerque telling him about a threat in his jurisdiction, rather than circulating a brief summary about this superlab in the weekly intelligence briefing sent to all high-ranking DEA officials they probably don't read.
Brilliant. Please consider writing a book about things like this.
Every little bit helps.
You can plot the timestamps of every message, read receipt and emoji reaction, which gives you the timezone and hints at work schedule, commute duration and vacations.
Often people will post photos or have profile pictures.
Say you have a photo taken at a random mcdonalds. That'd be 36'000 locations. Imagine cloudflare location and timezone help you narrow it down to new mexico. That's 80 locations. Small enough that you can look at every single one using street view and check where the photo actually was taken.
Now you can subpoena the McDonald's cctv footage and figure out who sent that picture.
You can almost certainly narrow down the McDonalds with a wide variety of things - this example is fairly contrived.
If you can see outside of the McDonalds for street view to be usable, you're almost certainly able to determine what country it is in, and potentially the exact location, depending on what is visible outside.
If it's a picture that shows the menu, well, street view isn't likely to be super useful, but you'd have a trivial time figuring out what country it is in at that point - menus vary from country to country, even when they are still in English.
New Mexico has relatively few McDonald's restaurants because New Mexico has a fairly low population - only 2.1m for the whole state. With that in mind, it seems unlikely that that Cloudflare has a close enough POP for you to be able to specifically decide it's NM.
If I can see enough for Street View to be able to confirm location, it seems like I can just search via the data there and get far more narrowed down results. If I can see a Burger King and a Best Buy outside from the picture, I can just use one of the many mapping services with APIs to get a list of all McDonalds locations within a tenth of a mile of a Burger King and Best Buy and look through a smaller list. If I'm confident of the time zone, like you suggest we should be able to be, then that's an even smaller list.
I'm not saying this attack is useless by any means, but I don't see a world where the sharing of the pictures to begin with isn't the most significant opsec failure and doesn't open you up to being de-anonymized in a myriad of other ways.
>Often people will post photos or have profile pictures.
>Say you have a photo taken at a random mcdonalds. That'd be 36'000 locations. Imagine cloudflare location and timezone help you narrow it down to new mexico. That's 80 locations. Small enough that you can look at every single one using street view and check where the photo actually was taken.
Sounds like the bigger opsec failure is posting the pictures, and the leaking the cloudflare POP only makes the search slightly easier.
> Sounds like the bigger opsec failure is posting the pictures, and the leaking the cloudflare POP only makes the search slightly easier.
I would not define 3 orders of magnitude as "slightly easier".
There is a fun post that explores this idea via an anime called Death Note.
https://gwern.net/death-note-anonymity
Repeat the attack daily for a few weeks and you might get a pattern of movement. Of course if the target hasn’t left their general area then this won’t help. But if you’re a nation state watching a target move between multiple international locations, you could match this up with passport travel data to significantly reduce the anonymity set.
Seems contrived. What type of a person cares about deanonymization attacks and nation-states trying to find him, but doesn't have an always-on VPN? Even without this attack, not using a VPN means you're 1 wrong click/tap away (if you accidentally clicked on a link) from leaking your IP.
Right, agreed that VPN is the primary mitigation against this from a user perspective. But opsec is hard, especially when the attack can be triggered by a notification when the victim might not be expecting it and might not have VPN enabled (e.g. maybe they only enable VPN when using Discord).
(But notifications are already a bad idea for opsec anyway.)
>But opsec is hard [...]
That's why the attack is contrived. If you have poor opsec you don't need need this attack at all. You can probably get the victim's exact IP by getting him to click on a link, or sending him an email. If he has good opsec he's going to be using a VPN that renders this attack useless. For this attack to be valuable you need a guy who has such good opsec that you can't get his location any other way, but for whatever reason isn't using an always-online VPN.
Two or three very small opsec failures equals one massive opsec failure.
It's leaking so many bits idk what else you would call it, deanonymization isn't a one shot thing and it's a spectrum not a binary outcome
WhatsApp has an option to disable link previews.
Surprised signal doesn't have this option.
I only message people I know on Signal anyway.
Edit: it seems signal does have the option
CloudFlare has the actual IP address that viewed the image. Which means some powerful (or rich enough) actors can get it.
This is very very bad.
This was... always, the case though? For any CDN service? How do you serve traffic to people without knowing where to send it?
Agree. Though a valid concern might be that a victim uses signal because of E2EE, thinking no 3rd party involved in delivery, not knowing/thinking about a CDN used.
Onion protocol.
> attacker can use the cache geolocation method to pinpoint the recipient’s location
Agree, good writeup, but also a stretch to say they are "pinpointing" anyone's location.
Send picture to multiple accounts, perhaps on different services, the links that are cached at the same data center can be more confidently believed to be related.
For that reason that's why federated setup such as matrix are better. It is much harder to deanonymiza a set of users on different servers in group chat.
Looks like it's possible to hit 2 datacenters due to load-balancing, which would narrow it down a bit more. Suppose you do this repeatedly as the target is moving around, hitting even more datacenters.
Did you see the GIF? It's able to triangulate.
Even time zone leaks are privacy issues, and the leak we're discussing is more fine grained than time zone.
Imagine sending a friend request to bin Laden's videographer and getting a reply from Pakistan while your entire military is looking for him in Afghanistan?
There's definitely cases where this is going to be immediately used. Shit, just using it to scrape Cloudflare for additional metadata on everyone from other user table leaks is probably valuable data. Even triangulation over time as they move around is going to get a more precise result. Maybe you find a vulnerability that takes that cloudflare node offline and run it again, repeat until you've got a fairly small radius they could be in.
> cached in a data center near that user
Not necessarily. Cloudflare is very upfront that they do not cache everything, and the time things are cached can vary greatly.
The kid keeps talking about "deanonymization" and he has no idea what the term actually means.
Mmmm "qualified deanonymization" perhaps?
Headline feels like a click bait :)
> (no other users near the data center).
Yeah and in that case there won't be a data center because who puts one in places without clients nearby? :)
This is not unique to signal. URL strings can contain identifying information regardless of where they are shared or posted. For example, if you send a link that ends with string of characters, these may correspond to a geographic location or browser settings. Blogger urls used to be geolocated, such as .ca for Canadian viewers. it is always safe to strip out unnecessary chacters if you're paranoid.
timing and location can usually prune things down to enough data about a person.
Cool writeup with some interesting techniques and approaches!
I'll echo the other comments and say "deanonymization" is stretching the definition of the word, along with "grab the user's location", as it isn't anything near precise. 150 miles is approx. a 2-hour drive on the highway from Atlanta, GA to Augusta, GA. In that radius, there's probably 700,000+ people.
I do think the auto-retrieve attachment feature of Signal is slightly concerning, as for a private messenger I'd expect there to be an option to turn it off (like turning off JS in Tor). I don't know if I'm not looking deep enough, but there doesn't seem to be a feature for that.
Signal appears to take a useful-by-default approach that balances privacy and ease-of-use in order to encourage adoption by the masses, I'd assume most people that are really concerned are hardening Signal, similar to what is in this guide: https://www.privacyguides.org/articles/2022/07/07/signal-con... . They've always recommended a VPN / proxy + a modification of settings for more high-security scenarios.
Caching isn't going anywhere, and neither is CloudFlare. The DoSing days of old in P2P multiplayer lobbies with exposed IPs seemed to carry more of a threat than this, CloudFlare's response seems to be the best out of the 3. Caching sensitive information is never recommended and the onus is on the application doing the communicating to tell their CDN / middle-service to not cache specific items.
> "deanonymization" is stretching the definition of the word, along with "grab the user's location", as it isn't anything near precise.
You'd think so, but you would be surprised how quickly this adds up to other details people share, like "oh I just drove 15 minutes to get Starbucks" or something to that effect, small things that eventually add up to a precise location over time.
> you would be surprised how quickly this adds up
Yes, but if social engineering is involved and tracing back through user conversations across a platform, it's hardly a vulnerability, let alone one deserving of a bounty. The way this is currently functioning is intended functionality, and can be further locked down depending on the user's threat model.
This can essentially be classified as opsec failure for the Signal user. If they're trying to hide from a hit in a 300 mile radius, they've got bigger problems to worry about, and should already be using a VPN setup.
Every time you click on a link your external IP addresses is exposed, is this a vulnerability? Being online without a VPN / proxy is inherent consent to have your external IP & other required items to be shared with services / middlemen.
When it comes to Discord, if you have this strict of a threat model and you're still using it, idk what to tell you.
If I can send you a link and be guaranteed that you click on it. Then that’s definitely a security issue.
Then it's a good thing that this isn't being claimed
The comment says: Every time you click on a link your external IP addresses is exposed, is this a vulnerability? Being online without a VPN / proxy is inherent consent to have your external IP & other required items to be shared with services / middlemen.
The fact that a user's IP is exposed when they click on a link is only relevant to the original post if a user would do this automatically and without realizing. The original post alleges that they can send someone a message on Signal and have the user automatically and somewhat unknowingly load a resource from a server. Sure, the author doesn't claim they have much control over the resource or the server, but they do show how you can check which server the user accessed and how that leaks information about the location of the user to a certain extent.
This is all the classic dismissals of security issues, including blaming the user.
> opsec failure for the Signal user
Signal's mission is to provide security for users who don't know the word 'opsec'.
Blaming the user is sometimes what it boils down to. Security includes a balancing act that involves usability, and Signal is firstly targeting the masses, but includes settings that can be configured for high-risk scenarios.
This "vulnerability" requires the user to have none of the normal things a person with a more extreme threat model would have already configured. EZPZ guides online on locking down Signal.
It's just like an iPhone. They don't ship with Lockdown Mode enabled by default, as it hurts the average consumer's usability. Signal at minimum will ensure no one is snooping on your messages, and it's up to the user whether they want to take that further.
If your definition of not providing security is allowing someone to know they exist on a continent, then that user's ISP has performed terribly as well since they aren't bouncing their signal around the world by default.
> Blaming the user is sometimes what it boils down to.
At least we agree about your argument. :)
> Signal at minimum will ensure no one is snooping on your messages, and it's up to the user whether they want to take that further.
Signal also secures metadata, including the participants in the conversation. That is undeniable - they have gone through considerable development investment to provide that feature.
> that user's ISP has performed terribly
Now we're blaming the ISP. If your app doesn't work with your users and ISPs, who does it work for? And how does a non-technical end-user know whether or when to trust you?
> When it comes to Discord, if you have this strict of a threat model and you're still using it, idk what to tell you.
I mean, you just never know... I've seen a lot of wild things, I've seen what drives people to doing crazy things. Just look up the "Deadly Runescape E Dater" who flew from the US to the UK to stab the girl he e-dated.
You can disable the auto-download. Settings > Data and storage > Media auto-download, you can choose what to auto download for mobile data/wifi/roaming.
Thank you! That's what I get for quick scrolling through the settings. I for sure thought it would have been under Privacy (for this concern), but that makes sense too.
Ah I made the same mistake.
Whatsapp has this option and I'm pretty sure it is in privacy settings.
So, just to confirm my understanding, if one goes into those settings and disables all auto-download, that helps- but, then a user will manually download images, correct? Are they still vulnerable to this issue then at that time?
A user might download images and yes, if they download images Cloudflare will show which datacenters have cached that image. They might also install an APK you give them or run that taylor_swift_concert.mp4.exe as well.
If I host an image on Cloudflare and put the URL here, I'll know which CF datacenters are near HN users who bother clicking the link as well.
hmm. I find the auto-download setting in the mobile app but not on desktop (mac). anyone know?
(some comments seem to suggest that the desktop app always auto-downloads)
it looks like it can’t be disabled for view-once media (or at least, that’s what the settings screen says)
I wonder if view-once media is even handled the same way as a regular attachment (using CF) or is sent more like a regular message.
I imagine if one really wanted it to be view-once, it wouldn't go to a CDN.
Thanks for pointing this out!
I think view-once media there means media hosted on signal servers, not remote servers? But not entirely sure.
I'd love a hard answer to this if anyone knows or has time to look at the source code.
https://github.com/signalapp
Random unrelated point: in a 100km radius circle between Atlanta and Augusta there are ~2,000,000 people (calculated using https://www.tomforth.co.uk/circlepopulations/ )
Haha thank you for doing the math! I was lazy and just added the populations and a plus at the end.
Cool! Contrary to some of the other posters I think this definitely counts as deanonymization, or at least is close enough. How anonymous would satoshi be today if we had his location to within 250 miles?
Repeated applications of this attack (maybe disguised somehow?) could let you track someone’s travel over time, and it is usually only takes 4-5 zip code sized locations to uniquely identify someone.
The counter point is that anyone who cares about being anonymous is using methods to disguise their identity that cannot be compromised by this attack, e.g: a VPN. Plus, there are much more effective versions of this attack, like sending a link to an endpoint that you control -- getting someone to click a link isn't hard if you're considered trustworthy enough to send them notifications. And less technical versions, like correlating when the user is online vs. offline with timezones around the world.
The method that both Apple and Cloudflare use in their own privacy software (iCloud Private Relay for apple, WARP for Cloudflare) is specifically based on the idea that your region is not information that reveals your identity. If you enable Apple Private Relay, your origin IP will be obscured but the IP your traffic is routed through will be in the same country -- same principle.
https://www.apple.com/icloud/docs/iCloud_Private_Relay_Overv...
This attack is academically interesting and novel but it's not "deanonymization".
On iCloud public relay, go to settings and select “use country and time zone” instead of “use general location.”
Now you’re no longer “within 250 miles,” hell my phone geo IPs everywhere from Louisiana to New Jersey , which are not even “in my time zone,” but there you go.
This setting was pissing meta/Facebook off big time because they also couldn’t narrow me down to a precise geographical area, resulting in much nagging and whining about “was this you signing in from [shreveport]?” and frequent account lockouts , password resets, and endless requests to approve my logins from a device that’s already logged in before I finally said to hell with it and deleted FB a few days ago.
I figure if a privacy setting makes meta mad , then it’s .. probably … a good setting. Must really irk them trying to sell location relevant ads when my state changes every other time I unlock my screen.
It’s a combined behavior of using private browsing and refusing to install their app, thereby giving them a permanent supercookie no matter what my IP is, so if you don’t like the sound of this it [might not] affect you if you use their apps. “X” does it too, just look up “inferred identity+ twitter” on google.
I’m editing out a tall claim in the last paragraph of this for some other time when I’m less tired and have sources next time we’re on the subject.
> The counter point is that anyone who cares about being anonymous is using methods to disguise their identity that cannot be compromised by this attack, e.g: a VPN.
Yes unless Apple is doing Apple things and ignores VPNs for things like push notifications…
https://x.com/mysk_co/status/1579997801047822336
I am not sure I understand what you mean by "trustworthy enough to send them notifications". Do you need anything other than one's phone number to send them a signal message?
The recipient would need to have this enabled, though it is by default. You can deactivate allowing others to initiate chats with you from your phone number (Settings > Privacy > Phone number)
Satoshi's possible home IP address actually did leak shortly after Bitcoin's release, though it wasn't realized until years later.
(It definitely may not be him and might instead be a random early user. But I think there's a moderate chance it's him.)
Details: https://news.ycombinator.com/item?id=29728339
(I don't advocate attempting to find and publish his name and address, since it'd make his life difficult, but it's still very interesting in the abstract as a curious unsolved mystery for all these years despite the number of eyes on it.)
How many people live in a 250 mile circle around New York?
I think the more important question is how many people in the world don't live within a 250 mile circle around New York? An investigator could potentially cut their geographical search down by 95%+.
Also the attack can be performed multiple times and if a person travels it could narrow down the possibilities quite a lot.
They had an example of the attack getting two locations back, Las Vegas and San Francisco.
So the target is somewhere in the many thousand square miles in the circle that encompasses almost half the US!
Let's say they travel between NY and LA, how many sources of data will you need to know who was in NY on a specific date and LA on a second date? Feels like only the government can reasonably locate that.
The government is a plausible adversary for Signal
FWIW if it's the government, wouldn't they be able to just get direct access to Cloudflare logs - in real-time even - and thus observe and track the specific incoming connection to fetch the cached image?
How many people live in a 250 mile circle around their Cloudflare POP?
Which Cloudflare POP I hit depends on which RSP I use. In the country I live in, our biggest RSP peers with Cloudflare in a neighboring country (as it is much cheaper for Cloudfare to send traffic via that RSP's peering exchange there). So something like 40% of traffic will seem to be from a entirely different country than reality.
My RSP is a small RSP which until fairly recently only had two POPs in the entire country. So regardless of where you lived, customers of my RSP would have traffic exiting onto the internet via only one of two exit points. Rural users would seem to be coming from one of the two largest cities in my country even if they are easily >250miles way from their particular POP. They do peer with Cloudflare but obviously only at the locations where they and Cloudflare are in the same city (and I'm not sure this is the case -- it is possible all national traffic to Cloudflare traffic actually goes via the one POP in our biggest city).
The only reason this attack identifies the city I happen to be in is because I live in the same city as my little's RSP's biggest POP and Cloudflare happens to peer with that RSP at that POP. Where I am is a large city so doesn't narrow things down very much -- but even worse is that whoever is looking for me would actually need to look anywhere in my country.
I don't think I am an unique case as internet routing is rarely the most direct path for various technical, financial, political, etc reasons.
De-anonymization is definitely stretching the reality of what this 'attack' is capable of IMHO.
Still quite anon. He almost certainly used a VPN, and if he didn't he likely lived in a major city which included thousands if not hundreds of thousands of capable engineers. If it said he was in SF during some messages that would tell us literally nothing.
You can already do the same with advertisement ID in (almost) every single one of these applications.
... very anonymous because he was most likely using a VPN lmao
Not sure why so many top comments dismiss the severity of this. This is just exactly the type of attack that give law enforcement or a malicious actor a way to establish proof of whereabouts.
I would guess some are just jealous of his age, but some do find the claim of de anonymizing to simply be overblown given it doesn't tell you nearly enough to find anyone except in very niche cases. This "attack" is easily defeated with a VPN or living in any major city.
You don't need to live in a major city. Cloudflare is never going to set up a caching proxy for a hamlet in the desert; you'll always be part of a huge group that a given caching proxy serves. The attacker can be happy if they can narrow the recipient's location down as much as to a single country
Posters are missing the point by projecting themselves into the scenario. Yes, it probably isn't a concern for someone living in the US or the EU. The calculus is different if you live in a smaller country, a politically sensitive area or are involved in activism against an authoritarian state.
Even for individuals in those large, developed suprastates, it opens the door for catfishing and other social engineering approaches.
Someone on GitHub called him out for making a Twitter account in 2017, since he'd have to be 8 years old at the time... I don't see what's so unbelievable about an 8yo making a Twitter account.
Interesting you touched on his age. I got extremely curious, why did the OP did such a flex?(assumming they are telling the truth). The first sentence is such a weird brag that it felt suspicious. The report is highly technical and extremely well written. We're either dealing with a pure genious or a fraud. But why would a genious flex? Doesn't make sense.
I can believe a very talented 15yo pulling this off. But the number of anonymous "I'm 15 and this is my impressive feat" posts on HN made me wonder if it's just a joke.
I don't know what you think is genius about any of this, but you're right, the flex is odd. It's something I've been seeing more and more of lately, and I find it off-putting, because, Back In My Day, I never had such a phase, where I felt like I should be given more credit for my 1337 h4xx0r skillz, because I was in high school or whatever—and I don't remember anyone else doing it, either.
I can only assume this is a consequence of modern social media having shifted the Internet from being a bunch of pseudonymous people making and sharing stuff, to everything being myopically focused on one's identity first, and what they do second (as is literally the case here).
And it looks like it works to achieve its desired effect, too—a significant portion of the comments here are congratulating the guy for doing such a thorough technical write-up, given his age. Maybe this is just me being a grumpy “old” man now, but I would've found that condescending when I was his age, and would've rather concealed my age than be condescended to as such. But, to each his own, I suppose.
I find genius being 15 and being in a state to find this issue and write this report. For my understanding there is a ton of context and knowledge packed in this write-up which i can't possibly imagine myself being able to grasp it at the age of 15. That is not to say it's not possible, but it goes to say it is very hard, that's why i characterise it as genius
Even knowing their country from this would be a big first step.
I believe most people (me included) dismiss the OP's claimed severity, as if it is being oversold. I see a balance of opinions saying "great find, but not as critical as claimed" so they don't seem dismissive. It is important to correctly classify the severity of issues. Proof of whereabouts is not deanonymization, especially when the abouts are so loose
They dismiss it for the same reason people dismiss disruptive new technology - they are uncomfortable with it. It's a signal (ha) that the threat is very real.
First dismiss it and see if the problem is still there in the morning. Hope that before then, someone finds a reason it's not a problem. Anyone?
Note: this person is the same 15-year old who found the Zendesk Slack takeover exploit a few months ago [1].
[1]: https://news.ycombinator.com/item?id=41818459
Given the twitter account was made in 2017, they would have been eight: https://x.com/hackermondev
And that bug report to Adobe was made when they would have been five years old: https://hackerone.com/daniel?type=user
I think that's just a quirk of HackerOne's username system. The username daniel was previously owned by another account (now known as daniel-hamid) which submitted a bug to Adobe. If you go through @hackermondev's tweets (starting in 2018) they are without question a kid (making games in Roblox and Minecraft) and then started to show an interest in hacking in 2020 (which lines up with when they created their HackerOne account). The claim of being 15 years old is plausible (presumably with parents / guardians who are accomplished in technology).
Why has Signal even enabled caching for those URLs? The most common case is going to be that the attachment is downloaded once, and that's it.
I would even expect that Signal wouldn't allow you to download it more than once, and would immediately delete it after the first successful download. Well, ok, maybe the client fails mid-way through, so allow some grace period for a re-download. But I can't imagine that would be the common case either, and so disabling caching on their CDN would fix this issue, and hopefully not increase their costs much.
At any rate, "deanonymization" is a bit clickbaity here. Narrowing someone's location to within 250 miles or so isn't great, but it doesn't deanonymize them.
Edit: I didn't think about the case where an attachment is sent to a group chat, where multiple people will be downloading it. But in that case wouldn't the attachment be encrypted individually for each person in the group? I'm not sure how this works, of course.
Signal's default setup is more usability focused while supporting E2E, and less about tinfoil hat threat models about being present on a continent you're a citizen of.
The items you mentioned can essentially be configured, for those that want the insane level of privacy / security. Messages can be auto-deleted 30 seconds after being seen, a proxy can be configured to route all your traffic through it, and tons of other things can be done to customize it more to the user's liking.
I'd imagine they're caching it because of egress costs. File attachments, voice mail, video, etc. can all add up.
> Signal's default setup is more usability focused while supporting E2E
If images/attachments were e2ee, this problem probably wouldn't exist, right? or are the images on cloudflare encrypted?
Edit: I should clarify. I didn't mean the encryption itself fixes the problem, but rather that: If this were handled like the text messages we send (not via cloudflare CDNs) then this wouldn't exist. I get that attachments are quite some bytes bigger than text but shouldn't the security guarantees be the same?
I actually also wondered about this because if Signal does not encrypt attachments and delivers them via CloudFlare and that would suck as CloudFlare could just look into all them.
It seems that signal is indeed encrypting all attachments and therefore the encrypted attachments are cached and served via CloudFlare.
From what I know* (heavy on the asterisk there), they are. I'm guessing at their setup at this point, but it sounds like the "large" data is probably being stored (while encrypted) in a different way / separately than the messaging. Since it's supposedly E2E (not gonna pretend I've hand verified it), it's decrypted on the device, but it needs to be grabbed in the first place from said separate place.
So, I'm guessing the images are encrypted where they're stored. And from his post it sounds like it doesn't happen with the messages, so the motivation for using CloudFlare probably is around egress pricing, or they could be using CloudFlare R2 for storage as well.
Group chats and multi-device users maybe
Unless I'm missing something, this seems like an incredibly long winded way to check the users IP location?
For example, connecting to a VPN and checking https://cloudflare.com/cdn-cgi/trace gives me `colo:CPH` (Copenhagen) which is far from my nearest CF datacenter (geographically), closer to the IP location from my VPN provider (Oslo) but still not particularly close?
If I don't use a VPN, I don't even get the capital city of my country (which I'm in right now), I get a colo approx 250 miles north. So I also dispute that Cloudflare always returns the "nearest available datacenter".
Don't get me wrong, the write up is cool and certainly interesting - just not convinced on the real world applications here...
> Unless I'm missing something, this seems like an incredibly long winded way to check the users IP location?
It's less accurate than that. IP Geocoding can be down to the city level in many cases. This is _maybe_ nearest cloudflare data center
>just not convinced on the real world applications here...
As a piece of data alone, the results are probably not of significant use.
The real-world application (and potential danger) is when this data is combined with other data. De-anonymization techniques using sparse datasets has been an active area of research for at least 15 years and it is often surprising to people how much can be gleaned from a few pieces of seemingly unconnected data.
> The real-world application (and potential danger) is when this data is combined with other data.
That's exactly the point. In this case it's only really possible to de-anonymize people who take long distance trips. But based on two data points it might be possible to know which flight or train a person travelled with.
With three different data points it might be quite unique. For example you might find out somebody travelled from Italy to Norway on Monday evening and then to France on Wednesday morning. There are probably not so many people who did a trip like that, it might come down to only one (or a handful) people who fits this itinerary. With other data sources it might be possible to uniquely identify this person.
>The real-world application (and potential danger) is when this data is combined with other data. De-anonymization techniques using sparse datasets has been an active area of research for at least 15 years and it is often surprising to people how much can be gleaned from a few pieces of seemingly unconnected data.
Seems pretty handwavy. Can you describe concretely how this would work?
>Seems pretty handwavy.
It has a whole Wikipedia article and everything.
https://en.wikipedia.org/wiki/De-anonymization#Re-identifica...
>Can you describe concretely how this would work?
Here's one of the earlier papers I remember off-hand, demonstrating one methodology. New (and improvements to existing) statistical techniques have happened in the ~18 years since this was published. Not to mention their is significantly more data to work with now.
https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf
"We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset."
From the Wiki I linked:
"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."
Point being that operational security is hard, and it takes a lot less to "slip up" and accidentally reveal yourself than most people think. Obtaining a location within 250 miles (or whatever) can be a key piece of information that leads to other dots being connected.
Other examples (albeit with less explanation) include police take downs of prolific CSAM producers by gathering bits and pieces of information over time, culminating in enough to make an identification.
>"We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world’s largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber’s record in the dataset."
> [...]
"Researchers at MIT and the Université catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them." [...] "A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts."
The only reason the two attacks work is that you have access to a bunch of uncorrelated data points. That is, ratings for various shows and their dates, and cellphone movement patterns. It's unclear how you could extend this to some guy you're trying to dox on signal. The geo info is relatively coarse and stays static, so trying to single out a single person is going to be difficult. To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much. That's going to be true for millions of people.
>To put another way, "guy was vaguely near New York on these dates" doesn't narrow down the search parameters by much.
That's why I said that this data alone is probably worthless, but can gain value when combined with other data.("As a piece of data alone, the results are probably not of significant use")
The combining of data is the important bit and the entire emphasis of both of my other comments.
Two pieces of otherwise anonymous data can, when combined, lead to re-identification.
>Two pieces of otherwise anonymous data can, when combined, lead to re-identification.
How are you going to get more anonymous data? Practically speaking if your target has such poor opsec that he's hemorrhaging bits of data, you probably don't need this attack to deanonymize them.
>How are you going to get more anonymous data?
All over the place? Your comment history here (and mine!) is full of data. Each piece alone isn't identifying, but there's a good chance that in aggregate it is.
If you share that username on discord/twitter/reddit/steam/whatever, that's even more data. If you reference old accounts anywhere, you guessed it, even more.
>you probably don't need this attack to deanonymize them
My comment wasn't necessarily specific to this attack, just noting that this attack can be an additional piece of data in the chain of re-identification.
You've gone from "not convinced on the real world applications here" to "how are you going to get more anonymous data". If we assume that you can get some data somewhere (a small list of example sources above), can we agree that there is, possibly, a real world application?
Do you not buy that a user's IP location needs to be protected?
There is a reason applications go to so much effort to proxy requests to resources such as images. It's not free to do this.
Having your IP address not revealed to people that can message you on Signal seems like a pretty reasonable privacy expectation.
Your IP isn't revealed though, only your vague geographic area.
That's marginally better, but can still be a problem. Just consider e.g. a whistleblower working for a company with a very small satellite office in a given country.
Did you even read it? There's no IP leak. And if you're a high target, then using some kind of proxy is literally the first step you take. The attack is nothing but an exaggeration and has no merit in real world
Yes, I read it. Information about your IP address is leaked, as that's how Cloudflare routes you to a given datacenter.
And I strongly disagree that being able to uncover somebody's rough geographic location is not a privacy problem.
I wouldn't be surprised if this, for example, lets you deduce if somebody is currently home, at work, or commuting (as all three ISPs might be hitting different Cloudflare datacenters). That's not information everybody is comfortable broadcasting to the world.
If you aren't comfortable broadcasting it, then maybe take measures so that it doesn't get to that point. Privacy is not by default, ever
To quote Signal themselves:
> Privacy isn’t an optional mode — it’s just the way that Signal works. Every message, every call, every time [1]
While I don't consider this a critical bug requiring an immediate technical remediation from Signal, this should definitely be either fixed or called out in the documentation at some point.
[1] https://support.signal.org/hc/en-us/articles/360007320391-Is...
The sentence before the one you quoted gives the essential context:
> Signal conversations are always end-to-end encrypted, which means that they can only be read or heard by your intended recipients.
They're not saying that it is an anonymisation proxy, they're saying the messages and calls are encrypted for the recipient rather than to the server
They also use AWS so good luck using it on your actual IP
Privacy by default is Signal's entire brand
Weather predictions are the weather channel's entire brand, but people understand the concept well enough to know that this doesn't mean it's infallible. There is a limit to how many warning stickers we need in the world. If you want to rely on a particular feature, maybe check that the product supports said feature. Signal does encryption, not onion routing
I guess it can be useful for tracking fugitive political dissidents, terrorists, etc. If you can narrow their location down to 250 miles, it's already very useful information. And without raising any suspicions.
It's not really narrowing it down to 250 miles; its narrowing it down to a circle whose radius is at least 250 miles or ~196,000mi^2.
My closest Cloudflare CDN is just listed as "DFW". The DFW metro area is about 8,700mi^2, and I imagine I could be even further than the "metro area" and still get the "DFW" Cloudflare datacenter.
In their little video animation, the area inside the overlap of those two circles encompasses several states. The edges of the two circles go from Washington to Florida and almost include Chicago. The target could have been in Denver or St Louis or Las Vegas or Phoenix or San Diego or San Francisco or Amarillo or El Paso.
I think it's still useful. Going from "we don't know where Osama bin Laden is at all" to "he's somewhere in Pakistan".
If only we knew OBL's Discord handle then we would have known he was about where we figured he was all along...
And then this whole thing gets thrown off if one uses a VPN with an endpoint somewhere other than where you are. Click a button, suddenly my datacenter is AMS. Click it again, suddenly its OTP...
>If only we knew OBL's Discord handle then we would have known he was about where we figured he was all along...
Discord is just an example, this can apparently work with many apps that store user attachments on Cloudflare.
>Click a button, suddenly my datacenter is AMS. Click it again, suddenly its OTP...
Well, if the location keeps changing, it's obvious it's not their real location. But if it’s always the same, no matter what, that’s a huge clue. Of course, this works best when you’ve got some other data to back it up. It’s kind of like playing Akinator - the more answers you get, the closer you get to figuring out the target. One answer might not tell you much, but three or four?
In their example target it pinged two datacenters, one in Dallas and on in San Franciso. Their requests might bounce between datacenters even if they aren't on a VPN.
This assumes that Osama bin Laden has poor enough opsec that he's using (eg.) Discord without a proxy. State actors have much more sophisticated techniques available.
(It's still an interesting vector, though! But it's true that the headline and writeup are a bit sensationalized.)
This is certainly an "attack" but not one you'd normally associate with zero click. There is no code execution, but some tricks to see which Cloudflare datacenter cached the image -- giving a very rough area the user is in. Impressive and insightful nonetheless.
depending on the circumstance, the rough area might already be useful to adversaries of the person trying to hide. I wouldn't expect things like criminals etc. to suffer from this, 300 miles is a big radius for example... but if you want to know if 'the guy is still in country' or something like that (for instance law enforcement) it's useful for them. such parties could then collaborate with local resources to do further investigations. knowing which local resources in what area to enable might save a lot of 'costs'.
as you said, impressive and insightful. :D kinda feel like the docs on it were a bit chatGPT aided, they are super clear and full of 'certain sentences'. (this is totally an excellent use-case for that, so not bashing on it at all!).
nice read.
You would know if they are over a cellular network or checking on mobile.
If someone sends you a youtube link and you hit play, YT knows who you are, both from a network perspective and potentially the logged in user.
If you are using signal in a high risk environment, you should be using it from a system that contains no extra information about you. This is the same posture one should take when using Tor.
Basic opsec.
I don't think these kinds of things are in signals threat model. It is meant? as a message platform for people with nothing to hide?
i don't think you can call opsec basic, since it requires tons of knowledge about technology and techniques adversaries might deploy against you. targets of attacks don't neccesarily have this kind of knowledge.
opsec is _incredibly_ hard for a person not deeply into technology and this type of information. you might argue that you need to stick with certain tools and techniques that are known good, but new vulnerabilities and techniques implemented against you can completely shatter previous knowledge on whats good and bad opsec and still break it despite doing it 'very well'. (like certain darknet markets being closed down due to new vulnerabilities being found in the platforms they use...)
most people who rely on opsec/tradecraft for a living, also rely on teams of people to help them maintain it and validate it constantly... (or eventually fail and get bitten).
you are right though that its unlikely a company or app producer would have a threat model tuned to people who want to hide stuff. those things generally tend to be closed down sooner or later. (encrochat and such services...)
You are absolutely right, I think it should be basic opsec, but is probably advanced opsec seeing how many folks get tripped up by this stuff.
This means, never using a browser context you have ever logged into any service that is personally identifying. That also means the order in which you load pages. If your ritual is open pintrest followed by slashdot, that is now your finger print.
It isn't just what you do, but how you do it and the ordering between those events. You also don't want to accidentally deanon yourself or your peers, even when everyone is trusted because it also leaks group membership information.
The mental framework for opsec can be modeled as vector calculus and differential geometry. You have to think of the flow of information across a surface and in the integral of that flow. Assume an adversary with perfect total information.
Law enforcement could probably just ask cloudflare for the exact IP address that retrieved the attachment.
Only if they're from a friendly country. If the reason a user needs anonymity is geopolitical, that isn't a guarantee.
Anyone can do this are per the TFM, which is an excellent read.
do you think law enforcement in Iran will get an answer from cloudflare?
So many comments get caught on the wording 'deanonymization'. Is there a standardized definition of 'deanonymization' accross industry experts, privacy-conscious people and hackers?
For many commenters, it looks like deanonymization means unveiling highly sensitive info like name, address, email, etc.
For privacy-conscious individuals and hackers, it looks like it means 'revealing a data point that shouldn't be revealed'.
As a signal or Discord user, I would expect my country location not to be revealed to a person I don't know. So the latter definition makes sense to me.
As you say, it depends on the person but I think for most people an acceptable definition is "deanonymization reveals PII". What qualifies as PII depends on the context/jurisdiction but typically an IP address would be considered PII whereas country (or a similar broad region) would not.
https://en.wikipedia.org/wiki/Personal_data
I'm a bit at a loss there. Has _anyone_ ever considered Signal to be anonymous? Or Discord? If so, I have bad news: they are not anonymous. At all. Not even slightly anonymous. Nor did they ever claim to be, they only claim to not be able to read your messages (Signal claims that, I don't know about Discord, I doubt it). And that claim has flaws (sure the crypto is sound but have you thoroughly reviewed and compiled the version you are using right now?)
At the very best, they are weakly pseudonymous, but that's about it. And yes, loading media by default has always been a staple of applications who prioritize their users' convenience at the expense of some security, a fine choice for the usual threat model of their users. And embedding media in messages has always been a staple of deanonymization attacks.
So ok, the tracking pixel has been shown to still be a relevant technique today, that's nice but not surprising.
If you want to remain anonymous though, don't use Discord or even Signal, and I'd advise against posting on HN either. Maybe, if you automate the pasting of messages (no js!) that has been reworded by a local llm from throwaway accounts through whonix, at random times that can't be correlated to your timezone, you _might_ have your chances. Don't bet on it.
Anonymity does not exist any longer.
I am currently banned from the Signal subreddit for pointing out that we only have Signal's word that they don't collect metadata. So, yeah, people do consider Signal anonymous...
This is just the fundamental way the internet works, and is the reason that anonymizing proxies like Tor exist.
If you don’t want people to be able to detect your rough geographic location, you should be using a proxy to hide it. For everybody else, knowing the edge server you are closest to is really not a threat.
No, it isn't. This is Cloudflare passing exposing metadata when it really shouldn't. Having a configuration option or a origin response header akin to CloudflareCache: private or something is trivial for them to implement.
The same information would then be available in the timing, but given the distributed nature here, that would be a lot harder to pull off.
People for whom it's a threat don't necessarily understand anonymizing proxies - very few do. Signal is supposed to provide security for those who do not.
Where does Signal claim that, or who decides what they're "supposed" to provide?
If wishes had wings, sheep would fly. People who want their computer to do a certain thing can also be expected to do a quick web search for how to make it do said thing. E.g.: hiding location? Use onion routing. Signal doesn't claim to hide your country (heck, they require your phone number!) so it seems wishful thinking to say they should have included e.g. a Tor client and enabled it by default
There's a real difference between Discord itself knowing your location and any Discord user in the world knowing it. Just like there's a difference between the VPN provider knowing your ipaddr and every website you visit knowing it.
What is the benefit of caching images in a cdn for Signal?
Assuming local client-side caching, the total number of requests for that resource should be very small, probably one in the vast majority of cases.
On an unrelated note, it seems like CloudFront could very easily fix this by not returning the cf-ray header, or at least having an option for the customer to remove it. Although, it might still be possible to get that information based on timing information...
> it seems like CloudFront could very easily fix this by not returning the cf-ray header
Then you just look at the response time. If the resource needs to be fetched from another continent, this is probably reliably measurable
Same for websites trying to hide which users exist: do a login request for an existing username and it'll do the password hashing (usually adds at least 50 ms to the response time), whereas for an invalid username it early exits. The fix is to always run the same code, so always do the hashing, which very few sites do. (Or not care about revealing this and telling people straight out that their username is unknown, if that fits with your threat model.) So to get back to Cloudflare's case: it won't help unless they delay responses, which is the opposite of what they're supposed to do
It isn't caching, it's CDNing. It is just an artefact of CDNs that they act as caches for the original content, and for improved distribution response time they cache to the nearest server from the response. ('Nearest' being an approximate heuristic, it is property of the anycast route tables in the BGP routers the request passes through, it is actually a 'best route'.)
That caching is something you can turn off, at least for every CDN that I have worked with.
The Cache-Control http header has a `private` directive specifically to inform CDNs and similar not to cache the response.
I dont' believe the Signal app/network is choosing to cacheimages in a CDN?
But any user can send anyone other user a message that includes a link to a CDN-cached resource. Isn't that the "attack" here? Or am I misunderstanding?
Signal does cache them in a CDN. If the vulnerability was sending any link, you could just set up your own web server and get the person’s IP
Ah, and the attack is knowing what CDN that is that signal itself is using, and examining it directly? I had missed that somehow.
Yes, Cloudflare should allow customers to disable that header, and Signal shouldn't cache images sent to a single person, or even groups of less than a few hundred people.
> the total number of requests for that resource should be very small
"For that server" is the other number-of-requests..
So that law enforcement can ask Cloudflare for the IP logs... Signal is a joke.
https://simplex.chat/
Looking at the locations where Cloudflare has their servers [1] in the middle of Europe. With Geneva, Zurich and Munich there is definitely the possibility that this attack on Signal will leak whether someone is at home or not.
I don't understand how Signal could dismiss this so easily. I'm starting get a bad feeling about their responses to these "low" stakes attacks. They already missed the ball on the database encryption mishap on desktop.
[1]: https://www.cloudflare.com/network/
So, it's like the [Spectre] attack against CPUs: trigger an access from a privileged context, check if the access has filled in some cache, infer privileged information from that.
It seems that time and again, security-enforcing procedures assume that many functions they invoke are pure, but in reality these functions have side effects, and these effects are observable much easier than the security requires.
The actual problem here that the secured area is only the stuff that came through the encrypted channel. Any access beyond it, like following a link, is obviously insecure. If the link was sent via the secure channel, it becomes even less secure because it allows to observe a correlation between the secure channel (otherwise impenetrable) and the insecure outside context, and allows to blow (some of) the cover. Opening links via Tor would mitigate it a bit.
The hard truth here is that almost everything may have observable side effects, so opsec needs to permeate all aspects of life, the more cover you need, the fuller. This is mostly incompatible with a convenient UX, but, to be popular, a secure messenger has to be reasonably convenient. This necessarily limits the level of security attainable by its casual use.
[Spectre]: https://en.wikipedia.org/wiki/Spectre_(security_vulnerabilit...
This is quite a detailed write up. I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request. I’ve seen conversation requests always waiting for me to accept or not. If I don’t accept, I don’t see any messages on that chat and the other person doesn’t get any indication of message delivery. What have I missed?
If the message is from a known or trusted contact, I think there can be larger problems than just a rough location reveal.
> didn’t get why Signal would just download an attachment from an unknown number/contact
Usability, most likely. Ultra-secure and paranoid doesn't result in good UX most of the time.
Push notification thumbnails. It's mentioned in the article ...
>I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request.
Where are you getting the impression that signal auto-downloads attachments from an unknown number/contact? The OP says there's auto-download, but not that it happens from unknown contacts.
> I went through the post quickly, but didn’t get why Signal would just download an attachment from an unknown number/contact without first prompting the user to accept or deny the conversation request.
I guess you went through the post too quickly, because it goes over how that's exactly how it works. Unless you have push notifications enabled and on default settings to include the content in the push notification.
Congrats on finding this. Very impressive for a 15-year-old!
The section "How to Protect Yourself" is lacking.
Step 1. Don't receive this information in the push message. Only send the fact that there is something waiting for you in the app. Chances are there are other vulnerabilities that compromise the end-to-end encryption guarantees provided by the app (and only by the app).
In Signal on iOS: Click on your icon in the top left corner. Click on settings. Click notifications. Click on display below "message contents". Make your choice.
Another situation where convenience clashes with security, unfortunately.
Step 2: If you use Discord, don't allow invites from _anyone_.
Its quite bizarre why social media apps allow anonymous people to interact with you. 99% of the conversation I have is with people that I roughly know.
Just don't use it. Expecting privacy and security from a literal keylogger. Who in their right mind uses a plain-text messaging app in 2025?
How is Discord a key logger?
Most people, actually.
If you're worried about anonymity and you're using discord, you're failing. It's not made for that.
>Its quite bizarre why social media apps allow anonymous people to interact with you.
I mean, it's one of Discord's major use-cases. Joining a server of a common interest and meeting/talking with other people that share that interest.
Discord is for gamers and quite a lot of people will be playing a game and tell someone "add me on discord my tag is xyz". Not allowing invites would seriously cut into the usability.
You could have it so both people have to add each other before there's any indication that either person added the other.
No extra work for person A, and the work for person B is just what person A had to do anyway.
This is mostly unusable for what should be obvious reasons.
I'm... not actually clear on what those reasons are? For the adder, the experience is exactly the same - the only difference is that there's no longer an adder and an addee - instead there are two adders.
> Its quite bizarre why social media apps allow anonymous people to interact with you
Bit strange to attribute this to 'social media apps', isn't it? I'm interacting with an anonymous person right now. Most platforms allow it, including the older ones (i.e., IRC)
So, how would you start interacting with your friends if you just created account?
>anonymous people
Wtf, how is this even relevant?
You can add them by creating a unique, temporary UUIDs/links that they can use?
You know them from somewhere else, lets say I play a game and we decided to get into a voice chat. We could create a temporary, dynamically created voice chat that we can all join (much like Google Meet) where all of us are anons.
Then, if we really want to know each other, we can then share the UUIDs.
I understand why ANYONE can send an email to me (I can decide when/will to check them)
I don't understand why ANYONE can whisper to my ears (I cant decide since they are pushed to the top of the app)
Adding this level of friction to the process is not viable for a messaging platform whose bread and butter is connecting with friends.
If I use Signal or Discord to send someone a link to anything hosted on a server controlled by me, provided that the user opens the link, I will get an exact IP address of the user. IP address is much more useful in de-anonymizing the user than the nearest CloudFlare datacenter location.
A fun attack, but I don't think this is a significant improvement over the existing state of the art using delivery receipt timings ("Hope of Delivery"). https://arxiv.org/pdf/2210.10523
Am I correct in surmising that someone who uses aVPN on their phone, while sending Signal messages/ content, would be cloaked, provided the VPN server they pick isn't near them ?
Yes, that is correct. VPN near location would be disclosed, not yours.
Usually, being identified as being part of such a huge group that there is no chance of being found is an example of anonymization, rather than deanonymization. The author might not like that there is any potential to narrow things down at all, but the information provided by this could be easily wrong if a VPN were used to have the traffic egress through a different geographic region.
Hmm "within 250 miles" is not deanonymization in my book. Unless you live in the middle of the desert. In which case there won't be a cloudflare DC near you anyway.
It's nice but at most will give you an indication of city. Perhaps together with some additional OSINT you could find the user but you'll need a lot more clues.
Well found though!
> When a user sends an attachment (e.g., an image) on Signal, it is uploaded to cdn2.signal.org.
Why is that even the case? I had understood that (binary) attachments are embedded into the encrypted message and hence transferred directly from sender to receiver.
Obviously, retrieving media from an external location saves bandwidth at multiple positions. I am not a security expert, but it seems almost trivial to see how storing message data on an external server conceptually facilitates attacks like this one. Isn't that the same reason a link preview is generated at the sender first and then embedded into the message as an image?
Clever finding but the title does no justice to the actual attack. Even a bare minimum threat model requires a user to use VPN or Tor which completely eliminates your "0day". Signal rightfully declined your report because it's only job is to provide secure communication
Signal is definitely also aiming to provide metadata privacy, which they understand to be part of secure communication.
Otherwise, they wouldn't pad attachment and message sizes, offer a "sealed sender" feature, allow relaying all calls to avoid callers/callees from learning users' IP addresses etc.
Signal is intended not for HN readers, but for ordinary people who don't understand VPNs and Tor.
While not 0-click, this might work even better using DNS and a more dense network of anycast DNS servers delegating a subdomain. Send a link to the target, and the DNS resolve should end up at your anycast DNS server. Respond with a CNAME entry, triggering a second DNS request and you can determine at which DNS server the request was served.
Would also work without anycast (and thus probably able to use a very dense botnet) and long list of NS entries for your domain.
I guess I'm not so "crazy" for funneling all my Android's outbound traffic through a VPN that does two hops.
Whether that's crazy depends on your threat model. If there's no reason, it could still be crazy in the sense of protecting from an irrational fear. If you communicate with people or organisations who shouldn't know your location, it makes sense. It depends
I believe in reality it's a bit more complicated
CDNs do not choose datacenters for users based on a geographic distance. The number one metric is latency but latency != physical distance. Second metric is optimizations of price of data transfer between peers and IXPs which results in very dynamic routing rules. Then consider also network/software hickups/maintanance and distribution of datacenters' load...
The accuracy of this geolocalization depends very much on peering agreements.
I don't know about the UD bit this will not be very accurate within the EU.
As an example: In Hungary, there's pretty much only one peering hub (bix) and there's only one Cloudflare datacenter. You've already geolocated me better than this hack just by knowing my language or phone prefix.
It's not because you have a Hungarian number that you're not travelling somewhere else. I don't really understand the point.
When I am traveling, i most likely use my mobile data. That data is tunneled to my mobile provider, exiting to the public internet at exactly the same server.
In my case, Cloudflare will identify me as BUD even when i'm roaming at a different country.
This behavior is very typical for the EU, because the telco landscape is fairly fragmented, and each company typically have only one, or at most 2 peering locations.
This may be different within the US where the distances are bigger, and latencies matter more, so there is more incentive to peer locally.
What's old is new. Does anyone remember the forum signatures that would display the viewers IP address and location on a little wooden signpost held up by a troll-looking creature?
https://cdn.geekzone.co.nz/images/forums/danasoftcache.jpg
I was fascinated by this once I learned how it worked. At the time I was learning php and wrote a script that would draw graphics based on the requesting ip address and return as gif, then used that as my avatar on a few phpbbs. Learned a lot.
Had a friend who made his own nice one, would then visit the thread, and figure out "who is viewing it" and show your username. ;)
Just like the days of DoSing an IP from a COD Lobby
That was a troll feature. It usually showed any user his own information.
MAYBE some forum doxxed users by posting their informatio? but I didn't see any.
My friend would figure out the username, but he never did it maliciously, just for the challenge. Forums would show you which user was viewing a thread...
Wikipedia still does if you aren't logged in
This is pretty interesting, and well documented. Great work! I wonder if there is a way to turn off notifications or if the approach is to simply not run such apps.
Not sure about mobile apps, but in Discord desktop there is an option under "settings -> notifications". Your browser may also have notification settings that would help.
This changes the attack from a 0-click attack to a 1-click attack.
It seems to me that a key requirement for this attack is that both the attacker and the victim load the same link, that is, that the attacker knows the URL the victim is going to load. If Signal/Discord created a different link to be given to the victim, and never shared it with the attacker, this attack wouldn't work.
That could be as simple as adding some extra pseudo-random parameters to the URL which will be ignored by the origin (but honored by the caches), or as complex as creating a completely separate URL for the receiver of the message, and somehow giving it to the receiver without giving it to the sender (easy on Discord, harder on Signal due to its end-to-end nature).
Since creating separate URLs would largely defeat the purpose of caching, a simpler solution would be to just disable caching, as Cloudflare suggested in their response.
I guess one possible fix would be for cloudflare to implement an option to disable the x-cache header for unauthorised users. This way Signal devs could still check their setup by sending authentication headers.
But it would solve the issue completely because you could always check the response time. Probably Signal should disable caching. I guess it's rare for someone to repeatedly download an attachment. Once it's there it's there. For grouped conversations it could be an issue though.
Not sure it's so rare. A large number of group chats will have people in the same area. For me it's the vast majority: family chat, groups of old classmates or flatmates are mostly in the same country, work chats too... I can think of one exception where a group member will consistently be hitting a different Cloudflare node from everyone else, but for everyone else, every time I send a picture into a group chat the caching will save traffic
Cool writeup by a 15yo, except for the way it completely oversells in the title.
Basically this allowed an attacker to find out which cloudflare data center a victim connected to when being tricked into loading something from cloudflare. This is often within a 250 mile radius of where they're living but not necessarily.
Can't one find out someone's IP just as easily by making them make a request to a URL controlled by an attacker? Is the problem that cloudflare is whitelisted for 0-click?
> Can't one find out someone's IP just as easily by making them make a request to a URL controlled by an attacker?
Unless you can find another flaw in Signal, that'd likely be a 1-click attack, which is less valuable than the 0-click attack demonstrated by the author.
Might even argue that the title is good because it made us click
I remember iOS not always respecting VPN's do these notification attachments get loaded through a VPN?
There was mention that the Teleport tool no longer works after the bugfix of the underlying issue (calling other cf locations via Workers and an internal subnet). It seemed like the ability to query which caches HIT on the dye-test image relied on being able to call out to each other DC.
Without this control over the route (driving the probing of which caches were hit), the attack would no longer work, right?
There is another method to query the caches. This is mentioned in the article.
Ah, the VPN deployment which probes from various geographies? It has limited coverage (according to author, about 54% of all Cloudflare datacenters) but still a sometimes-working attack, granted.
However, Cloudflare are known for being harsh on VPN exit points and the behavior of requesting the same (unique each pass) image from every geography and then never again, would probably look significantly suspicious, but yeah it seems not to be a priority for cloudflare at the moment.
"deanonymization" in this case is just plain wrong, you can't even tell which country the user would be in for sure. Also any proxy/vpn will completely protect against this.
It's "a very rough estimation of a user's location when they are not using a vpn".
2 questions - why do airports get cached with Cloudflare requests, and, if I use a VPN, am I getting content from my usual Cloudflare centre or the one from the country on the VPN I’m using?
Wouldn't other user that sees the other person's profile picture also drum up the cache? This wouldn't work for someone in a large server.
The attacker uses a patched version of Signal to be able to intercept requests and to block a get request to the attachment they have just created. At least it is my understanding.
That’s just to be able to use their APIs to get the location of the sender.
Example you used the normal Signal app without patch and sending me a message, and I have the patched version.
Just to remove certificate pinning, to be able to see the API traffic because of encryption.
I'm not sure how much if it makes any sense.
Guess what: you don’t need cloudflare
Is he just 15? The level of technical details, and this part is not that simple: “quickly patched the Signal desktop app to remove SSL pinning and configured Burp to intercept and view HTTP requests/responses sent through the app”
You’d be surprised at how adept the younger generation can be, especially those who’ve grown up with technology. As tech evolves, so do they. There are kids who genuinely apply themselves, and because they’ve been immersed in this environment, it’s practically second nature to them. I remember the late 1990s: I was young, but more than anything, I was curious about how things worked, I had the luxury of time, and access to technology to explore it. I started coding in C++ when I was around 13, and honestly, I still feel like I started too late.
There are also a lot more kids doing this than before. Like, I was one of 12(?) students in our high school AP Comp Sci course, then just one year after, 120 students took the same course.
How is it different than sending someone an image hosted on your server which is a tracking pixel and just get their IP+location?
This will be more accurate than the cloudflare approach.
Well, unlike with tracking pixels, you are not in the direct request path and cannot block it. You also have no way monitor/log if it is happening (like you can in theory with a packet capture).
It's obvious in hindsight, but I bet no one would have mentioned this possibility as why you should disable notification previews or that simply receiving a notification would possibly reveal this information.
If your target is savvy enough not to click random links sent by strangers, it's hard to get them to load it. Many apps have caught onto the tracking pixel technique. It used to work for iMessage long ago.
You can't instruct a random Signal client to fetch a random URL. Here's how this attack works:
1. Attacker sends novel image to Signal
2. Signal hosts the image on their core servers
3. Signal instructs victim to fetch preview of the image
4. Victim asks the CDN for the image
5. CDN gets the image from Signal core servers and caches it
6. Victim gets the image from the CDN and displays the preview normally
7. Attacker hits every one of the CDN cache servers
8. The CDN cache server that say "yep, saw that already" is the one closest to the victim
Can't you already see the IP of the datacenter that requested the image from your server/pixel and map it to the data center+location?
This is assuming the data center is directly requesting the source server which it might be given a few searches on Google [1].
[1] https://community.cloudflare.com/t/cloudflare-is-forwarding-...
Looks like Cloudflare are still sending out the airport locations and hit status on the response headers. Maybe I'm missing something but it seems like if you had a large VPN network you could run a distributed query to figure out which edge nodes have cached the url.
That’s exactly what the author does towards the end.
I guess signal preview-loading or remote-image-loading features are always going to be usable to identify broadly what region a user is in, using this attack.
Can one disable those features in Signal? Would be annoying becuase they are nice, but yeah.
If you don't want that attack to be able to locate you somewhat (or at least locate your internet endpoint, if you are using a VPN or something), you will need to turn off signal previews and network image displays. Right?
Can probably achieve the same level of deanonymization by just monitoring what times the user communicates most often. Or send them enough links that they'll click on.
So the default option of using onion routes to hide your IP and location still works.
Not exactly the same type of attack, but very similar https://cyberinsider.com/timing-attacks-on-whatsapp-signal-t...
de-anonymization attack?
- The information extracted is a rough 250 mile radius around the user
- The attacker already has a way to contact the person (signal username / phone number)
Intersting reading, but also seems like technical clickbait.
I think all these things are absolutely ridiculous.
I use alpine (the email client, not the Linux distro). Before that, I used pine.
Every single thing that gets loaded from anywhere on the Internet has to be the result of an action that I take. Nothing ever gets loaded automatically. I get to choose if I load the thing using the server that I'm connected to, or if I load it directly on my local machine. I know the implications of each.
The fact that programs, particularly ones that are supposed to be for the security minded like Signal, load anything by default, automatically, is just, well, naive.
I can't be the only person who thinks that people who don't think these things through shouldn't be working on apps and email clients. Sure, people would have a cow if their email client didn't load every frigging thing and run remote Javascript and so on, but in Signal? Really?
(end rant)
I see that this can be turned off. I will now tell everyone I know that uses Signal that this should, in fact, be turned off.
That's a 15 year old.
I can't even convince what the gouvernements are able to do. You could technically route signal over tor network but then even tor has vulnerabilities with it's C coding.
"Luckily" my ISP is DTAG which has horrible peering with Cloudflare. So I'm routed through Warsaw (WAW) most of the time, even though there are multiple closer datacenters in Germany.
You could use this technique to see what geographic areas view what sites based on the content cache age, you would have to have the list of sites, but it would allow you to bucket a geographic by top sites from the test corpus.
Imagining the cloudflare datacenters as cachelines and this is just like a side-channel attack like spectre. Not as fine-grained but still cool stuff.
Unfortunate that Cloudflare patched the issue enabling specific datacenters to be targeted. Would have been extremely useful for finding the location of servers behind Cloudflare.
Just by the fact he's expressing distances in miles, I can say he's from USA.
That's my 0-click deanonymisation.
Nice attack otherwise.
>Just by the fact he's expressing distances in miles, I can say he's from USA.
And you could be falling into his trap of getting you to believe so by expressing distance in miles, as well.
Anyone send Snowden a push notification? Would be interesting to see if he's still in Russia...
I hate how teenagers can't help but post their age when it comes achievements like it makes them special. 'hai im 15 and i hack billion dollar companies in my spare time.' This is cringe AF. I don't care if they're "only a teenager." Presumably, the age was written to signal how le special they are and not liek other teenagers. So if you want special treatment learn how to be modest and don't over-exaggerate your achievements. Any adult who managed to read past this sentence is a bigger person than I am.
I think its good for finding out if someone is still in a certain region. More like region identification not deanonymization.
Impressive write-up, especially for your age! Thanks for sharing :)
This doesn't strike me as a new 'attack' (I have to imagine there's even a name for such attacks), and 250 miles seems a large radius to 'deanonymize' someone, even a high-value target (even if such people didn't take any other measures to avoid being tracked...)
For reference, here's a 250 mile radius around Toronto Canada https://i.imgur.com/ydpR0IZ.png
Cloudflare's business model is fingerprinting as a service. Awesome.
Would be very interesting to see how other IM behave with this:
For example: Jami - one of the most feature-complete, distributed IM...
If you need to deanonymize a user who moves around a lot, this method makes sense.
“deanonymization”
Hardly.
Amazing sleuthing but not deanonymization.
Nice work OP, and congrats on HN front-page. Keep publishing or it never happened!
It's a classic timing attack. You can detect which Cloudflare datacenter is "closest" (ie. least network latency) to a targeted Signal or Discord user.
The speed of light is the main culprit here.
Great job, you're going to go far Daniel.
why is the picture not simply cached near the sender as opposed to the receiver?
is there any good reason for deciding this way on the part of Signal et al?
The attacker can't be forced to make a request. In this PoC the attacker disabled their own outgoing image requests.
But that wouldn't help anyway, even if the image could be cached near the sender first, or the signal server prewarmed some other cache. After the victim opened the image, the attacker would see two locations that have the image cached, and could easily deduce which one is the victim's location (e.g. if Signal pre-warmed a random cache, repeating the attack a couple of times would be enough to eliminate the randomness).
It's cached near the receiver for performance purposes, I assume, the same reason Cloudflare uses geographically local caches.
Surprised that was only worth 200 bucks.
this is pretty devastating for signal
Presumably cloudfare will close the loophole for enumerating cache edges now.
very impressive findings
This is an extremely cool avenue of attack, I love the bot/demonstration.
Pretty impressive work.
Why does CloudFlare return whether it was a cache hit or miss? This information could be hidden/removed. I understand it's not a complete solution of the issue, because cached responses will return much faster than non-cached ones, but it's a step in the right direction.
is simplex immune?
why signal even have that side channel???
even matrix encode image and other data in the e2e p2p message flow
impressive
> it's possible for an attacker to run a cache geolocation attack to find out which local datacenter they're near--similar to how law enforcement track mobile devices through cell phone towers.
very much disagree on this, they track mobile devices through your connection strength to multiple cellular towers while this attack proves which singular datacenter the victim is nearest.
Don’t get me wrong the write up is really interesting but it does feel like the author is a bit of a sensationalist.
While the detection area of the cloudflare attack is bigger I think the main problem here is that its much easier to get access to it than to cellphone towers.
> through cell phone towers
Extremely sensationalist. A cell tower might have a range of a few miles, max. This is giving ranges of 250+mi.
> Don’t get me wrong the write up is really interesting but it does feel like the author is a bit of a sensationalist.
They claim to be 15 years old. Cut them some slack.
"bit of a sensationalist" is reasonable feedback; no slack needed. After all, this is how they learn.
people learn when they’re given kind, direct, actionable feedback from people they trust - not when they’re called sensationalists by random critics on the internet.
what have you made lately?
>people learn when they’re given kind, direct, actionable feedback from people they trust - not when they’re called sensationalists by random critics on the internet.
So what are we supposed to do? Dox him, find who his friends are, and use them to backchannel feedback? I think the "sensationalist" critique is direct and actionable - just don't do it.
I think "this writing is sensationalist" is constructive, actionable feedback.
And I think expecting that all criticism must come from people the target of it knows and trusts is a bit much.
> what have you made lately?
Plenty of stuff. But that's irrelevant. People are free to give feedback regardless of what they've been working on.
It would probably be better for such learning to occur in a place that doesn't create immutable records of judgments from one's peers; i.e. Hacker News comments.
Their twitter says
> Joined November 2017
so likely a bit older :)
It's not unfathomable for a precocious 8yr old to register for an account, with or without parental guidance.
Ah, that's true. They even have HackerOne activity from 8 years ago: https://hackerone.com/daniel/hacktivity?type=user
So either they lied about their age then in order to join social media and they're some sort of child prodigy... or they're lying now.
that's a hackerone bug, that 8-year-old report is not mine :)
It's clearly a different username on the 8 year old report.
Or that hackerone account has been traded.
No.
"Signal instantly dismissed my report"
"Telegram, another privacy-focused application, is completely invulnerable to this attack"
"Discord […] citing this as a Cloudflare issue other consumers are also vulnerable to"
"Cloudflare ended up completing patching the bug"
I wish Signal would react differently. I still remember the bubble color controversy when they changed their mind after the backlash and not before. :-)
> "Cloudflare ended up completing patching the bug"
This short quote fragment is a little misleading: Cloudflare patched the bug in their systems that allow you to send HTTP requests to any CF data center, regardless of where the originator of the request lives. This is likely something they want fixed for a large variety of reasons, some probably much more important than the specific attack OP wrote about.
> I wish Signal would react differently.
The severity of a potential security issue, or the determination of who is responsible for fixing or mitigating it, is a matter of opinion. Just because you think this is important for Signal to fix, it doesn't mean it's some absolute truth that it does. At the risk of appealing to authority, I would expect that people who run a security/privacy-focused messaging project to have a better handle on classifying these sorts of things than random people on HN like you or me.
But of course, sometimes they'll get it wrong too. I'm not familiar with the bubble color thing you mention, but sure, nobody's perfect; we're all human and we make mistakes. I'm personally not convinced Signal needs to do anything here. A 250 mile radius is quite a large area, and users can already choose to not auto-download attachments. To be fair, though, I think a simple way for Signal to fix this would be to disable caching on the attachments HTTP endpoints, though that might increase their bandwidth bills and increase load on their servers, depending on what their access patterns look like.
I just sent a feature request[1] to Signal with the following text:
[1]: https://support.signal.org/hc/en-us/requests/newHold on, someone else in this thread noted this does exist
" You can disable the auto-download. Settings > Data and storage > Media auto-download, you can choose what to auto download for mobile data/wifi/roaming."
So, that part is there, but my question is, it's still aissue when they manually download the image, right? Unless something never accepts images from someone they aren't expecting, who 's number or unique created ID has never been seen before
Oh, nice. I looked under Settings > Privacy and didn’t see anything. For me it was under Settings > Data Usage.
Yes, this still an issue if you manually download an attachment, but that’s a lot better than automatically when you open a conversation.
Signal is likely funded by the CIA and other intelligence actors:
https://yasha.substack.com/p/signal-is-a-government-op-85e
https://www.kitklarenberg.com/p/signal-facing-collapse-after...
https://www.city-journal.org/article/signals-katherine-maher...
https://drewdevault.com/2018/08/08/Signal.html
https://bigleaguepolitics.com/court-docs-show-fbi-can-interc...
Is there really any difference between dismissing the report or "citing this as a Cloudflare issue"?
Not in practice.
> There's clearly a problem here as Cloudflare says consumers are responsible for protecting themselves against these types of attacks, while consumers (ex. Discord) are putting the blame on Cloudflare.
>"Signal instantly dismissed my report"
>I wish Signal would react differently. I still remember the bubble color controversy when they changed their mind after the backlash and not before. :-)
Can you blame them though? They're a non-profit with limited manpower and resources. There's quite a lot of cranks in the security field, and as many people have echoed in this thread, the bug report is rather sensationalist. At some point you just have to pattern match and ignore any reports that seems a bit too cranky. Is this ideal? No. But I don't see how it's any different than summarily dismissing a vaccine skeptic's claim that vaccines are bad, even if there's a kernel of truth buried in there (eg. that benefits for young people are questionable).
For being 15 year old, cool work!
But calling this de-anonymization is a stretch, if it can possibly pinpoint you within 250 miles (that's assuming geoip is correct too, which it rarely is).
In their GeoGuesser demonstration video, the higlighted area is densely populated and you still would need to match millions of people vs the online user.
It does provide some hints as to the location of the targeted user, and that is cool!
De-anonymization would take monitoring over a period of time, but it could definitely work. Take this scenario for example: a person of interest is in the area of New York on Jan 1. On Jan 4 they travel to the UK. On Jan 7 they travel to Germany. On Jan 21 they travel back to the US.
The list of suspects would be fairly small when US officials cross-check individuals that travelled US-UK on Jan 4 and Germany-US on Jan 21.
It is already more than enough to know which country to contact the authorities and to pinpoint a rough area where to look.
If the scammer is in Nigeria, tough luck. If he is in the EU or US then exists a feasible chance to go after the person.
> assuming geoip is correct too
It's not using geoip, it's using anycast.