Once upon a time around 2001 or so I used to have a static line at home and host some stuff on my home linux box. A windows NT update had meant a lot of them had enabled this optimistic encryption thing where windows boxes would try to connect to a certain port and negotiate an s/wan before doing TCP traffic. I was used to seeing this traffic a lot on my firewall so no big deal. However there was one machine in particular that was really obnoxious. It would try to connect every few seconds and would just not quit.
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.
The lesson for any programmers reading this is to always set an upper limit for how much data you accept from someone else. Every request should have both a timeout and a limit on the amounts of data it will consume.
Around the same time, or maybe even earlier, some random company sent me a junk fax every Friday. Multiple polite voicemails to their office number were ignored, so I made a 100-page PDF where every page was a large black rectangle, and used one of the new-fangled email-to-fax gateways to send it to them. Within the hour, I got an irate call. The faxes stopped.
I enjoyed reading this, thank you for sharing. When you say you tried to contact the admin of the box and that this was common back then, how would you typically find the contact info for an arbitrary client's admin?
I had a lazy fix for a down detection on my RPi server at home, it was pinging a domain I owned and if it couldn't hit that assumed it wasn't connected to a network/rebooted itself. I let the domain lapse and this RPi kept going down around 5 minutes... thought it was a power fault, then I remembered about that CRON job.
You’d be surprised to know, that in a majority of the cases of NT installations in that era, providing services, there were very, very few admins around to even notice what was going on. Running services like this on an NT box was done ‘in order to not have to have an admin’, in so many thousands of cases, it cannot be underestimated.
Disclaimer: I put a lot of servers on the Internet in the 90’s/early 2000’s. It was industry-wide standard practice: ‘use NT so you don’t need an admin’.
I wonder if I could create a 500TB html file with proper headers on a squashfs, an endless <div><div><div>... with no closing tags, and if I could instruct the server to not report file size before download.
I am not sure how that could’ve worked. Unless the real /dev tree was exposed to your webserver’s chroot environment, this would’ve given nothing special except “file not found”.
The whole point of chroot for a webserver was to shield clients from accessing special files like that!
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
So I actually do this (use compression to filter out bots) for my one million checkboxes Datastar demo[1]. It relies heavily on streaming the whole user view on every interaction. With brotli over SSE you can easily hit 200:1 compression ratios[2]. The problem is a malicious actor could request the stream uncompressed. As brotli is supported by 98% of browsers I don't push data to clients that don't support brotli compression. I've also found a lot of scrapers and bots don't support it so it works quite well.
If you nest the gzip inside another gzip it gets even smaller since the blocks of compressed '0' data are themselves low entropy in the first generation gzip. Nested zst reduces the 10G file to 99 bytes.
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
I never hosted WP, but as soon as you have a HTTP server expose to the internet you will get request to /wp-login and such. It as become a good way to find bots also. If I see an IP requesting anything from a popular CMS, hop it goes in the iptables holes
There's ways that prevent it -
- Freeze all code after an update through permissions
- Don't make most directories writeable
- Don't allow file uploads, or limit file uploads to media
There's a few plugins that do this, but vanilla WP is dangerous.
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
> you can always create zip bombs that are links on a web page that don't show up for humans
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
> you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors)
> I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server.
This is the main reason I haven't installed zip bombs on my website already -- on the off chance I'd make someone angry and end up having to fend off a DDoS.
Currently I have some URL patterns to which I'll return 418 with no content, just to save network / processing time (since if a real user encounters a 404 legitimately, I want it to have a nice webpage for them to look at).
Should probably figure out how to wire that into fail2ban or something, but not a priority at the moment.
Automated systems like Cloudflare and stuff also have a list of bot IPs. I was recently setting up a selfhosted VPN and I had to change the IPv4 of the server like 20 times before I got an IP that wasn't banned on half the websites.
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
Try content labyrinth. I.e. infinitely generated content with a bunch of references to other generated pages. It may help against simple wget and till bots adapt.
Perhaps need to semi-randomize the file size?
I'm guessing some of the bots have a hard limit to the size of the resource they will download.
Many of these are annoying LLM training/scraping bots (in my case anyway).
So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
There was an incident a little while back where some Tor Project anti-censorship infrastructure was run on the same site as a blog post about zip bombs.[0] One of the zip files got crawled by Google, and added to their list of malicious domains, which broke some pretty important parts of Tor's Snowflake tool. Took a couple weeks to get it sorted out.[1]
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
There is a similar thing for ssh servers, called endlessh (https://github.com/skeeto/endlessh). In the ssh protocol the client must wait for the server to send back a banner when it first connects, but there is no limit for the size of it ! So this program will send an infinite banner very ... very slowly; and make the crawler/script kiddie script hang out indefinitely or just crash.
Hilarious because the author, and the OP author, are literally zipping `/dev/null`. While they realize that it "doesn't take disk space nor ram", I feel like the coin didn't drop for them.
Other than that, why serve gzip anyway? I would not set the Content-Length Header and throttle the connection and set the MIME type to something random, hell just octet-stream, and redirect to '/dev/random'.
I don't get the 'zip bomb' concept, all you are doing is compressing zeros. Why not compress '/dev/random'? You'll get a much larger file, and if the bot receives it, it'll have a lot more CPU cycles to churn.
Even the OP article states that after creating the '10GB.gzip' that 'The resulting file is 10MB in this case.'.
Is it because it sounds big?
Here is how you don't waste time with 'zip bombs':
$ time dd if=/dev/zero bs=1 count=10M | gzip -9 > 10M.gzip
10485760+0 records in
10485760+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 9.46271 s, 1.1 MB/s
real 0m9.467s
user 0m2.417s
sys 0m14.887s
$ ls -sh 10M.gzip
12K 10M.gzip
$ time dd if=/dev/random bs=1 count=10M | gzip -9 > 10M.gzip
10485760+0 records in
10485760+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 12.5784 s, 834 kB/s
real 0m12.584s
user 0m3.190s
sys 0m18.021s
$ ls -sh 10M.gzip
11M 10M.gzip
As an aside, there are a lot of people out there standing up massive microservice implementations¹ for relatively small sites/apps, which need to have this part printed, wrapped around a brick, and lobbed at their heads:
> A well-optimized, lightweight setup beats expensive infrastructure. With proper caching, a $6/month server can withstand tens of thousands of hits — no need for Kubernetes.
----
[1] Though doing this in order to play/learn/practise is, of course, understandable.
It's probably watching for connections to files listed in robots.txt that should not be crawled, etc. Once a client tries to do that thing (which it was told not to do), then it gets tagged malicious and fed the zip file.
Long story short, I use memcached to track ips, user agent, and the use of POST method. The requests per minute, request payload, and past behavior will make isMalicious() return true.
I'm curious why a 10GB file of all zeroes would compress only to 10MB. I mean theoretically you could compress it to one byte. I suppose the compression happens on a stream of data instead of analyzing the whole, but I'd assume it would still do better than 10MB.
A compressed file that is only one byte long can only represent maximally 256 different uncompressed files.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
It has to cater for any possible input. Even with special case handling for this particular (generally uncommon) case of vast runs of the same value: the compressed data will probably be packetized somehow, and each packet can reproduce only so many repeats, so you'll need to repeat each packet enough times to reproduce the output. With 10 GB, it mounts up.
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
I get your point(and have no idea why it isn't compressed more), but is the theoretical value of 1 byte correct? With just one single byte, how does it know how big should the file be after being decompressed?
Good question. The "ultimate zip bomb" looks something like https://github.com/iamtraction/ZOD - this produces the infamous "42.zip" file, which is about 42KiB, but expands to 3.99 PiB (!).
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
It'd have to be more than one byte. There's the central directory, zip header, local header then the file itself you need to also tell it how many zeros to make when decompressing the actual file but most compression algorithms don't work like that because they're designed for actual files not essentially blank files so you get larger than the absolute minimum compression.
There probably aren’t any perfectly lossless compression algorithms, I guess? Nothing would ever be all zeroes, so it might not be an edge case accounted for or something? I have no idea, just pulling at strings. Maybe someone smarter can jump in here.
It would be a fairly short Perl script to read the access logs and curl a HEAD request to all URLs accessed, printing only those with 200 OK responses.
Here's a start hacked together and tested on my phone:
perl -lnE 'if (/GET ([^ ]+)/ and $p=$1) {
$s=qx(curl -sI https://BASE_URL/$p | head -n 1);
unless ($s =~ /200|302/) {
say $p
}
}'
Also interested in this. For now I've left a server up for a couple of weeks, went through the logs and set up fail2ban for the most common offenders. Once a month or so I keep checking for offenders but the first iteration already blocked many of them.
As I don't use PHP in my server, but get a lot of requests for various PHP related stuff, I added a rule to serve a Linux kernel encrypted with a "passphrase" derived from /dev/urandom as a reply for these requests. A zip bomb might be a worse reply ...
For all those "eagerly" fishing for content AI bots I ponder if I should set up a Markov chain to generate semi-legible text in the style of the classic https://en.wikipedia.org/wiki/Mark_V._Shaney ...
15+ years ago I fought piracy at a company with very well known training materials for a prestigious certification. I'd distribute zip bombs marked as training material filenames. That was fun.
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
Just crossed my mind that perhaps lots of bot traffic is coming from botnets of unaware victims who downloaded a shitty game or similar, orchestrated by a malicious C&C server somewhere else. (There was a post about this type of malware recently.) Now, if you crash the victims machine, it’s complicated at least ethically, if not legally.
This topic comes up from time to time and I'm surprised no one yet mentioned the usual fearmongering rhetoric of zip bombs being potentially illegal.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
> For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
See https://research.swtch.com/zip for how to make an infinite zip bomb: ie a zip file that unzips to itself, so you can keep unzipping forever without ever hitting bottom.
I am ignorant as to how most bots work. Could you have a second line of defense for bots that avoid this bomb: Dynamically generate a file from /dev/random and trickle stream it to them, or would they just keep spawning parallel requests? They would never finish streaming it, and presumably give up at some point. The idea would be to make it more difficult for them to detect it was never going to be valid content.
It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.
"On my server, I've added a middleware that checks if the current request is malicious or not"
How accurate is that middleware? Obviously there are false negatives as you supplement with other heuristics. What about false positives? Just collateral damage?
I guess it goes without saying, that the first thing should be to follow security best practices. Patch vulnerabilities fast etc., before doing things like that. Then maybe his first website wouldn't have compromised either.
I like a similar trick, sending very large files hosted on external servers to malicious visitors using proxies. Usually those proxies charge by bandwidth, so it increases their costs.
"But when I detect that they are either trying to inject malicious attacks, or are probing for a response" how are you detecting this? mind sharing some pseudocode?
There's a lot of creative ideas out there for banning and/or harassing bots. There's tarpits, infinite labyrinths, proof of work || regular challenges, honeypots etc.
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
Mildly amusing, but it seems like this is thinking that two wrongs make a right, so let us serve malware instead of using a WAF or some other existing solution to the bot problem.
Once upon a time around 2001 or so I used to have a static line at home and host some stuff on my home linux box. A windows NT update had meant a lot of them had enabled this optimistic encryption thing where windows boxes would try to connect to a certain port and negotiate an s/wan before doing TCP traffic. I was used to seeing this traffic a lot on my firewall so no big deal. However there was one machine in particular that was really obnoxious. It would try to connect every few seconds and would just not quit.
I tried to contact the admin of the box (yeah that’s what people used to do) and got nowhere. Eventually I sent a message saying “hey I see your machine trying to connect every few seconds on port <whatever it is>. I’m just sending a heads up that we’re starting a new service on that port and I want to make sure it doesn’t cause you any problems.”
Of course I didn’t hear back. Then I set up a server on that port that basically read from /dev/urandom, set TCP_NODELAY and a few other flags and pushed out random gibberish as fast as possible. I figured the clients of this service might not want their strings of randomness to be null-terminated so I thoughtfully removed any nulls that might otherwise naturally occur. The misconfigured NT box connected, drank 5 seconds or so worth of randomness, then disappeared. Then 5 minutes later, reappeared, connected, took its buffer overflow medicine and disappeared again. And this pattern then continued for a few weeks until the box disappeared from the internet completely.
I like to imagine that some admin was just sitting there scratching his head wondering why his NT box kept rebooting.
The lesson for any programmers reading this is to always set an upper limit for how much data you accept from someone else. Every request should have both a timeout and a limit on the amounts of data it will consume.
Around the same time, or maybe even earlier, some random company sent me a junk fax every Friday. Multiple polite voicemails to their office number were ignored, so I made a 100-page PDF where every page was a large black rectangle, and used one of the new-fangled email-to-fax gateways to send it to them. Within the hour, I got an irate call. The faxes stopped.
I enjoyed reading this, thank you for sharing. When you say you tried to contact the admin of the box and that this was common back then, how would you typically find the contact info for an arbitrary client's admin?
tangent
I had a lazy fix for a down detection on my RPi server at home, it was pinging a domain I owned and if it couldn't hit that assumed it wasn't connected to a network/rebooted itself. I let the domain lapse and this RPi kept going down around 5 minutes... thought it was a power fault, then I remembered about that CRON job.
You’d be surprised to know, that in a majority of the cases of NT installations in that era, providing services, there were very, very few admins around to even notice what was going on. Running services like this on an NT box was done ‘in order to not have to have an admin’, in so many thousands of cases, it cannot be underestimated.
Disclaimer: I put a lot of servers on the Internet in the 90’s/early 2000’s. It was industry-wide standard practice: ‘use NT so you don’t need an admin’.
Didn't get why that WinNT box was connecting to your box. Due to some misconfigured Windows update procedure?
That’s awesome! Thank you for sharing.
Back when I was a stupid kid, I once did
on my home page as a joke. Browsers at the time didn’t like that, they basically froze, sometimes taking the client system down with them.Later on, browsers started to check for actual content I think, and would abort such requests.
I made a 64kx64k JPEG once by feeding the encoder the same line of macro blocks until it produce the entire image.
Years later I was finally able to open it.
I wonder if I could create a 500TB html file with proper headers on a squashfs, an endless <div><div><div>... with no closing tags, and if I could instruct the server to not report file size before download.
Any ideeas?
Sounds like the favicon.ico that would crash the browser.
I think this was it:
https://freedomhacker.net/annoying-favicon-crash-bug-firefox...
I hope you weren’t paying for bandwidth by the KiB.
Maybe it's time for a /dev/zipbomb device.
Wait, you set up a symlink?
I am not sure how that could’ve worked. Unless the real /dev tree was exposed to your webserver’s chroot environment, this would’ve given nothing special except “file not found”.
The whole point of chroot for a webserver was to shield clients from accessing special files like that!
Could server-side includes be used for a html bomb?
Write an ordinary static html page and fill a <p> with infinite random data using <!--#include file="/dev/random"-->.
or would that crash the server?
Devide by zero happens to everyone eventually.
https://medium.com/@bishr_tabbaa/when-smart-ships-divide-by-...
"On 21 September 1997, the USS Yorktown halted for almost three hours during training maneuvers off the coast of Cape Charles, Virginia due to a divide-by-zero error in a database application that propagated throughout the ship’s control systems."
" technician tried to digitally calibrate and reset the fuel valve by entering a 0 value for one of the valve’s component properties into the SMCS Remote Database Manager (RDM)"
we discovered back when IE3 came out that you could crash windows by leaving off a table closing tag.
[flagged]
These days, almost all browsers accept zstd and brotli, so these bombs can be even more effective today! [This](https://news.ycombinator.com/item?id=23496794) old comment showed an impressive 1.2M:1 compression ratio and [zstd seems to be doing even better](https://github.com/netty/netty/issues/14004).
Though, bots may not support modern compression standards. Then again, that may be a good way to block bots: every modern browser supports zstd, so just force that on non-whitelisted browser agents and you automatically confuse scrapers.
So I actually do this (use compression to filter out bots) for my one million checkboxes Datastar demo[1]. It relies heavily on streaming the whole user view on every interaction. With brotli over SSE you can easily hit 200:1 compression ratios[2]. The problem is a malicious actor could request the stream uncompressed. As brotli is supported by 98% of browsers I don't push data to clients that don't support brotli compression. I've also found a lot of scrapers and bots don't support it so it works quite well.
[1] checkboxes demo https://checkboxes.andersmurphy.com
[2] article on brotli SSE https://andersmurphy.com/2025/04/15/why-you-should-use-brotl...
If you nest the gzip inside another gzip it gets even smaller since the blocks of compressed '0' data are themselves low entropy in the first generation gzip. Nested zst reduces the 10G file to 99 bytes.
How will my browser react on receiving such bombs? I’d rather not to test it myself…
gzip it's everywhere and it will mess with every crawler.
> At my old employer, a bot discovered a wordpress vulnerability and inserted a malicious script into our server
I know it's slightly off topic, but it's just so amusing (edit: reassuring) to know I'm not the only one who, after 1 hour of setting up Wordpress there's a PHP shell magically deployed on my server.
>Take over a wordpress site for a customer
>Oh look 3 separate php shells with random strings as a name
Never less than 3, but always guaranteed.
Yes, never self host Wordpress if you value your sanity. Even if it’s not the first hour it will eventually happen when you forget a patch.
I never hosted WP, but as soon as you have a HTTP server expose to the internet you will get request to /wp-login and such. It as become a good way to find bots also. If I see an IP requesting anything from a popular CMS, hop it goes in the iptables holes
Wordpress is indeed a nice backdoor, it even has CMS functionality built in.
>after 1 hour
I've used this teaching folks devops, here deploy your first hello world nginx server... huh what are those strange requests in the log?
There's ways that prevent it - - Freeze all code after an update through permissions - Don't make most directories writeable - Don't allow file uploads, or limit file uploads to media
There's a few plugins that do this, but vanilla WP is dangerous.
I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server. I switched to just identifying 'bad actors' who are clearly trying to do bad things and just banning their IP with firewall rules. That's becoming more challenging with IPV6 though.
Edit: And for folks who write their own web pages, you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors). Bots download those things to have a look (so do crawlers and AI scrapers)
> you can always create zip bombs that are links on a web page that don't show up for humans
I did a version of this with my form for requesting an account on my fediverse server. The problem I was having is that there exist these very unsophisticated bots that crawl the web and submit their very unsophisticated spam into every form they see that looks like it might publish it somewhere.
First I added a simple captcha with distorted characters. This did stop many of the bots, but not all of them. Then, after reading the server log, I noticed that they only make three requests in a rapid succession: the page that contains the form, the captcha image, and then the POST request with the form data. They don't load neither the CSS nor the JS.
So I added several more fields to the form and hid them with CSS. Submitting anything in these fields will fail the request and ban your session. I also modified the captcha, I made the image itself a CSS background, and made the src point to a transparent image instead.
And just like that, spam has completely stopped, while real users noticed nothing.
Check this out if you want to stop this behavior...
https://github.com/skeeto/endlessh
> you can always create zip bombs that are links on a web page that don't show up for humans (white text on white background with no highlight on hover/click anchors)
RIP screen reader users?
Why is it harder to firewall them with IPv6? I seems this would be the easier of the two to firewall.
These links do show up for humans who might be using text browsers, (perhaps) screen readers, bookmarklets that list the links on a page, etc.
> I sort of did this with ssh where I figured out how to crash an ssh client that was trying to guess the root password. What I got for my trouble was a number of script kiddies ddosing my poor little server.
This is the main reason I haven't installed zip bombs on my website already -- on the off chance I'd make someone angry and end up having to fend off a DDoS.
Currently I have some URL patterns to which I'll return 418 with no content, just to save network / processing time (since if a real user encounters a 404 legitimately, I want it to have a nice webpage for them to look at).
Should probably figure out how to wire that into fail2ban or something, but not a priority at the moment.
Automated systems like Cloudflare and stuff also have a list of bot IPs. I was recently setting up a selfhosted VPN and I had to change the IPv4 of the server like 20 times before I got an IP that wasn't banned on half the websites.
I am just banning large swaths of IPs. Banning most of Asia and the middle east reduced the amount of bad traffic by something like 98%.
fail2ban automates that and is in package managers
Zip bombs are fun. I discovered a vulnerability in a security product once where it wouldn’t properly scan a file for malware if the file was or contained a zip archive greater than a certain size.
The practical effect of this was you could place a zip bomb in an office xml document and this product would pass the ooxml file through even if it contained easily identifiable malware.
Eh I got news for ya.
The file size problem is still an issue for many big name EDRs.
I deployed this, instead of my usual honeypot script.
It's not working very well.
In the web server log, I can see that the bots are not downloading the whole ten megabyte poison pill.
They are cutting off at various lengths. I haven't seen anything fetch more than around 1.5 Mb of it so far.
Or is it working? Are they decoding it on the fly as a stream, and then crashing? E.g. if something is recorded as having read 1.5 Mb, could it have decoded it to 1.5 Gb in RAM, on the fly, and crashed?
There is no way to tell.
Try content labyrinth. I.e. infinitely generated content with a bunch of references to other generated pages. It may help against simple wget and till bots adapt.
PS: I'm on the bots side, but don't mind helping.
Perhaps need to semi-randomize the file size? I'm guessing some of the bots have a hard limit to the size of the resource they will download.
Many of these are annoying LLM training/scraping bots (in my case anyway). So while it might not crash them if you spit out a 800KB zipbomb, at least it will waste computing resources on their end.
Do they comeback? If so then they detect it and avoid it. If not then they crashed and mission accomplished.
It's worth noting that this is a gzip bomb (acts just like a normal compressed webpage), not a classical zip file that uses nested zips to knock out antiviruses.
There was an incident a little while back where some Tor Project anti-censorship infrastructure was run on the same site as a blog post about zip bombs.[0] One of the zip files got crawled by Google, and added to their list of malicious domains, which broke some pretty important parts of Tor's Snowflake tool. Took a couple weeks to get it sorted out.[1]
[0] https://www.bamsoftware.com/hacks/zipbomb/ [1] https://www.bamsoftware.com/hacks/zipbomb/#safebrowsing
I protected uploads on one of my applications by creating fixed size temporary disk partitions of like 10MB each and unzipping to those contains the fallout if someone uploads something too big.
`unzip -p | head -c 10MB`
What? You partitioned a disk rather than just not decompressing some comically large file?
I do something similar using a script I've cobbled together over the years. Once a year I'll check the 404 logs and add the most popular paths trying to exploit something (ie ancient phpmyadmin vulns) to the shitlist. Requesting 3 of those URLs adds that host to a greylist that only accepts requests to a very limited set of legitimate paths.
There is a similar thing for ssh servers, called endlessh (https://github.com/skeeto/endlessh). In the ssh protocol the client must wait for the server to send back a banner when it first connects, but there is no limit for the size of it ! So this program will send an infinite banner very ... very slowly; and make the crawler/script kiddie script hang out indefinitely or just crash.
Attacked Over Tor [2017]
https://www.hackerfactor.com/blog/index.php?/archives/762-At...
The same, for Caddy: https://www.dustri.org/b/serving-a-gzip-bomb-with-caddy.html
10T is probably overkill though.
Hilarious because the author, and the OP author, are literally zipping `/dev/null`. While they realize that it "doesn't take disk space nor ram", I feel like the coin didn't drop for them.
Think about it:
Other than that, why serve gzip anyway? I would not set the Content-Length Header and throttle the connection and set the MIME type to something random, hell just octet-stream, and redirect to '/dev/random'.I don't get the 'zip bomb' concept, all you are doing is compressing zeros. Why not compress '/dev/random'? You'll get a much larger file, and if the bot receives it, it'll have a lot more CPU cycles to churn.
Even the OP article states that after creating the '10GB.gzip' that 'The resulting file is 10MB in this case.'.
Is it because it sounds big?
Here is how you don't waste time with 'zip bombs':
As an aside, there are a lot of people out there standing up massive microservice implementations¹ for relatively small sites/apps, which need to have this part printed, wrapped around a brick, and lobbed at their heads:
> A well-optimized, lightweight setup beats expensive infrastructure. With proper caching, a $6/month server can withstand tens of thousands of hits — no need for Kubernetes.
----
[1] Though doing this in order to play/learn/practise is, of course, understandable.
IsMalicious() doing some real heavy lifting in that pseudo code. Would love to see a bit more under THAT hood.
It's probably watching for connections to files listed in robots.txt that should not be crawled, etc. Once a client tries to do that thing (which it was told not to do), then it gets tagged malicious and fed the zip file.
Long story short, I use memcached to track ips, user agent, and the use of POST method. The requests per minute, request payload, and past behavior will make isMalicious() return true.
I know ive been on THAT list before. Heaven forbid i dont have chrome or keep it up to date, shame on me!
I'm curious why a 10GB file of all zeroes would compress only to 10MB. I mean theoretically you could compress it to one byte. I suppose the compression happens on a stream of data instead of analyzing the whole, but I'd assume it would still do better than 10MB.
A compressed file that is only one byte long can only represent maximally 256 different uncompressed files.
Signed, a kid in the 90s who downloaded some "wavelet compression" program from a BBS because it promised to compress all his WaReZ even more so he could then fit moar on his disk. He ran the compressor and hey golly that 500MB ISO fit into only 10MB of disk now! He found out later (after a defrag) that the "compressor" was just hiding data in unused disk sectors and storing references to them. He then learned about Shannon entropy from comp.compression.research and was enlightened.
It has to cater for any possible input. Even with special case handling for this particular (generally uncommon) case of vast runs of the same value: the compressed data will probably be packetized somehow, and each packet can reproduce only so many repeats, so you'll need to repeat each packet enough times to reproduce the output. With 10 GB, it mounts up.
I tried this on my computer with a couple of other tools, after creating a file full of 0s as per the article.
gzip -9 turns it into 10,436,266 bytes in approx 1 minute.
xz -9 turns it into 1,568,052 bytes in approx 4 minutes.
bzip2 -9 turns it into 7,506 (!) bytes in approx 5 minutes.
I think OP should consider getting bzip2 on the case. 2 TBytes of 0s should compress nicely. And I'm long overdue an upgrade to my laptop... you probably won't be waiting long for the result on anything modern.
I get your point(and have no idea why it isn't compressed more), but is the theoretical value of 1 byte correct? With just one single byte, how does it know how big should the file be after being decompressed?
Good question. The "ultimate zip bomb" looks something like https://github.com/iamtraction/ZOD - this produces the infamous "42.zip" file, which is about 42KiB, but expands to 3.99 PiB (!).
There's literally no machine on Earth today that can deal with that (as a single file, I mean).
It'd have to be more than one byte. There's the central directory, zip header, local header then the file itself you need to also tell it how many zeros to make when decompressing the actual file but most compression algorithms don't work like that because they're designed for actual files not essentially blank files so you get larger than the absolute minimum compression.
There probably aren’t any perfectly lossless compression algorithms, I guess? Nothing would ever be all zeroes, so it might not be an edge case accounted for or something? I have no idea, just pulling at strings. Maybe someone smarter can jump in here.
It requires at leadt few bytes, there is no way to represent 10GB of data in 8 bits.
There's around a 64KB block size limit for a block of compressed data. That sets a max compression ratio.
gzip isn't optimal for this case. It divides the file into blocks and each one has a header. Apparently that's about 1 byte per 1000.
Is there a list of popular attack vector urls located somewhere? I want to just auto-ban anyone sniffing for .env or ../../../../ etc.
Rather not write it myself
check out the lists in this repo
https://github.com/danielmiessler/SecLists/blob/master/Disco...
I combined a few of the most interesting lists from here into one and never miss an attack now
It would be a fairly short Perl script to read the access logs and curl a HEAD request to all URLs accessed, printing only those with 200 OK responses.
Here's a start hacked together and tested on my phone:
Also interested in this. For now I've left a server up for a couple of weeks, went through the logs and set up fail2ban for the most common offenders. Once a month or so I keep checking for offenders but the first iteration already blocked many of them.
Check out Modsecurity WAF and CoreRuleSet.
As I don't use PHP in my server, but get a lot of requests for various PHP related stuff, I added a rule to serve a Linux kernel encrypted with a "passphrase" derived from /dev/urandom as a reply for these requests. A zip bomb might be a worse reply ...
For all those "eagerly" fishing for content AI bots I ponder if I should set up a Markov chain to generate semi-legible text in the style of the classic https://en.wikipedia.org/wiki/Mark_V._Shaney ...
15+ years ago I fought piracy at a company with very well known training materials for a prestigious certification. I'd distribute zip bombs marked as training material filenames. That was fun.
Is there any legal exposure possible?
Like, a legitimate crawler suing you and alleging that you broke something of theirs?
Disclosure: IANAL
The CFAA[1] prohibits:
> knowingly causes the transmission of a program, information, code, or command, and as a result of such conduct, intentionally causes damage without authorization, to a protected computer;
As far as I can tell (again, IANAL) there isn't an exception if you believe said computer is actively attempting to abuse your system[2]. I'm not sure if a zip bomb would constitute intentional damage, but it is at least close enough to the line that I wouldn't feel comfortable risking it.
[1]: https://www.law.cornell.edu/uscode/text/18/1030
[2]: And of course, you might make a mistake and incorrectly serve this to legitimate traffic.
Just crossed my mind that perhaps lots of bot traffic is coming from botnets of unaware victims who downloaded a shitty game or similar, orchestrated by a malicious C&C server somewhere else. (There was a post about this type of malware recently.) Now, if you crash the victims machine, it’s complicated at least ethically, if not legally.
Please, just as a conversational piece, walk me through the potentials you might think there are ?
I'll play the side of the defender and you can play the "bot"/bot deployer.
Though anyone can sue anyone, not doing X is the simplest thing that might avoid being sued for doing X.
But if it matters pay your lawyer and if it doesn’t matter, it doesn’t matter.
>User-agent: *
>Disallow: /zipbomb.html
Legitimate crawlers would skip it this way only scum ignores robots.txt
> Before I tell you how to create a zip bomb, I do have to warn you that you can potentially crash and destroy your own device
Surely, the device does crash but it isn’t destroyed?
This topic comes up from time to time and I'm surprised no one yet mentioned the usual fearmongering rhetoric of zip bombs being potentially illegal.
I'm not a lawyer, but I'm yet to see a real life court case of a bot owner suing a company or an individual for responding to his malicious request with a zip bomb. The usual spiel goes like this: responding to his malicious request with a malicious response makes you a cybercriminal and allows him (the real cybercriminal) to sue you. Again, except of cheap talk I've never heard of a single court case like this. But I can easily imagine them trying to blackmail someone with such cheap threats.
I cannot imagine a big company like Microsoft or Apple using zip bombs, but I fail to see why zip bombs would be considered bad in any way. Anyone with an experience of dealing with malicious bots knows the frustration and the amount of time and money they steal from businesses or individuals.
> For the most part, when they do, I never hear from them again. Why? Well, that's because they crash right after ingesting the file.
I would have figured the process/server would restart, and restart with your specific URL since that was the last one not completed.
What makes the bots avoid this site in the future? Are they really smart enough to hard-code a rule to check for crashes and avoid those sites in the future?
This post is suspiciously similar to my post from 2017 "How to defend your website with ZIP bombs"
https://blog.haschek.at/2017/how-to-defend-your-website-with...
I also had the idea of zip bomb to confuse badly behaved scrapers (and I have mentioned it before to some other people, although I did not implemented it). However, maybe instead of 0x00, you might use a different byte value.
I had other ideas too, but I don't know how well some of them will work (they might depend on what bots they are).
See https://research.swtch.com/zip for how to make an infinite zip bomb: ie a zip file that unzips to itself, so you can keep unzipping forever without ever hitting bottom.
See also (2017) HN, https://news.ycombinator.com/item?id=14707674
I think it's a good idea, but it must be coupled with robots.txt.
I am ignorant as to how most bots work. Could you have a second line of defense for bots that avoid this bomb: Dynamically generate a file from /dev/random and trickle stream it to them, or would they just keep spawning parallel requests? They would never finish streaming it, and presumably give up at some point. The idea would be to make it more difficult for them to detect it was never going to be valid content.
It is surprising that it works (I haven't tried it). `Content-Length` had one goal - to ensure data integrity by comparing the response size with this header value. I expect http client to deal with this out of the box, whether gzip or not. Is it not the case? If yes, that changes everything, a lot of servers need priority updates.
The hard part is the content of isMalicious() function. The bots can crash but they’d be quick to restart anyway.
Do you mind sharing your specs of your digital ocean droplet? I'm trying to setup one with less cost.
If anyone is interested in writing a guide to set this up with crowdsec or fail2ban I'm all ears
"On my server, I've added a middleware that checks if the current request is malicious or not"
How accurate is that middleware? Obviously there are false negatives as you supplement with other heuristics. What about false positives? Just collateral damage?
Can someone explain why mods change post titles? What value does it provide in their mind?
I guess it goes without saying, that the first thing should be to follow security best practices. Patch vulnerabilities fast etc., before doing things like that. Then maybe his first website wouldn't have compromised either.
I can't imagine using anything other than a stream interface when dealing with web requests in a crawler.
You need that to protect against not only these types of shenanigans, but also large or slow responses.
I like a similar trick, sending very large files hosted on external servers to malicious visitors using proxies. Usually those proxies charge by bandwidth, so it increases their costs.
"But when I detect that they are either trying to inject malicious attacks, or are probing for a response" how are you detecting this? mind sharing some pseudocode?
Wouldn't it be cheaper to use Cloudflare than task a human to obsessively watch webserver logs on a box lacking proper filtering?
There's a lot of creative ideas out there for banning and/or harassing bots. There's tarpits, infinite labyrinths, proof of work || regular challenges, honeypots etc.
Most of the bots I've come across are fairly dumb however, and those are pretty easy to detect & block. I usually use CrowdSec (https://www.crowdsec.net/), and with it you also get to ban the IPs that misbehave on all the other servers that use it before they come to yours. I've also tried turnstile for web pages (https://www.cloudflare.com/application-services/products/tur...) and it seems to work, though I imagine most such products would, as again most bots tend to be fairly dumb.
I'd personally hesitate to do something like serving a zip bomb since it would probably cost the bot farm(s) less than it would cost me, and just banning the IP I feel would serve me better than trying to play with it, especially if I know it's misbehaving.
Edit: Of course, the author could state that the satisfaction of seeing an IP 'go quiet' for a bit is priceless - no arguing against that
If one wanted to create the ICE of cyberspace in cyberpunk, capable to destroy the device ...
Zip libraries aren’t bomb proof yet? Seems fairly easy to detect and ignore, no?
But what about the bots written in Rust? Will that get rid of them too?
OP: Hi guys this is how I fend off hackers! Hackers: Note taken.
it'd be cool to have a proof of work protocol baked into http. like, a header that browsers understood
Serving a zip bomb is pretty illegal. The bot will restart its process anyway, and carry on as if nothing happened.
ok but where I put this?? at the files directory???
this was a cool read.very interesting stuff.
Mildly amusing, but it seems like this is thinking that two wrongs make a right, so let us serve malware instead of using a WAF or some other existing solution to the bot problem.