IMHO, it might be worthwhile for NGINX to default to sendfile+kTLS enabled where appropriate. Maybe the potential for negative experience is too high.
I know sendfile originally had some sharp edges, but I'm not sure how sharp it still is? You would need to use sendfile only for plain http or https with kTLS, and maybe that's too complex? Apache lists some issues [1] with sendfile and defaults to off as well; but I don't know how many sites are still serving 2GB+ files on Itanium. :P AFAIK, lighttpd added SSL_sendfile support on by default 3 years ago, and you can turn it off if you want.
I think there's also some complexity with kTLS on implementations of kTLS that limit protocol version and cipher choices, if it's on by choice it makes sense to refuse to operate with cipher selection and kTLS cipher availability that conflict, but if kTLS is on by default, you probably need to use traditional TLS for connections where the client selects a cipher that's not eligible for kTLS. Maybe that's extra code that nobody wants to write; maybe the inconsistency of performance depending on client cipher choice is unacceptable. But it seems like a worthwhile thing to me (but I didn't make a PR, did I?)
Just my two cents, as an end-user choosing a OS to use on an N150 to do static web hosting, I would sure like to know if those features make a meaningful difference.
But I also understand, that looking at that might have beyond the scope of the article.
But that said, it would be interesting to see the different systems after a tuning pass. Both as an example of capability, but also as an mechanic to discuss tuning options available to the users.
Mind, the whole "its slow get new hardware" comes from the fact that getting another 10% by tuning "won't fix the problem". By the time folks feel the sluggish performance, you're probably not looking for another 10 points. The 10 points matter at scale to lower overall hardware costs. 10% less hardware with a 1000 servers is a different problem with 10% less hardware with just one.
But, still, a tuning blog would be interesting, at least to me.
The numbers seems to be too much near 65535 to be a coincidence.
are you making the request from a single IP address source?
are you aware of the limit of using the same source IP address for the same destination IP address ( and port )? ( each connection can have only a unique source address and source port to the destination, maxing out in source 65535 ports ) for the same destination
I would expect http persistent connections (keep-alive) at these rates. It's very hard to get 64 k connections/second from a single IP to a single server ip:port without heavily tuning the client, which they don't mention doing. They're only testing for 10 seconds, but still, you'd need to clear all the closed connections out of TIME_WAIT pretty darn quick in order to re-use each port 10 times.
Sucks that that there's no ECC-RAM model. A phone-sized x86 slab, as opposed to those impractical mini-PC/Mini-Mac boxes, that one could carry around and connect to a powerbank of similar size, and/or various types of screens (including a smartphone itself), would make for a great ultramobile setup.
Odroid H4 family (H4, H4 Plus, H4 Ultra) supports in-band ECC, which supports one-bit error correction and two-bit error detection. And the 8-core model is just $220 (+case, +heatsink/fan, +shipping, but oh well)
If you want relatively small low-power box with ECC, checkout Asustor AS6804T. It is nominally a NAS but really you can use it for anything you want, it is just an x86-64 server with some disk bays. You also get nice 2x10GbE, which is rare with these minipcs
If it had a a few more cores, something like this would make for a great node in a distributed system like k8s or ceph for a homelab. At the asking price, however, one could also cross shop an HP micro server gen11.
How many times do you think ECC RAM has caught an error? Online anecdotes I've found indicate almost no one experiences regularly corrected errors that weren't due to imminently failing hardware.
I've managed a couple thousand servers with ECC. The vast majority had zero reported errors the whole life. Of those that reported errors, there were a few categories:
Some reported a couple errors a day for months (maybe years?) but worked fine.
Some ramped up error counts over hours or days.
Some went from zero to lots in one step.
A few managed to hit uncorrectable errors; sometimes just once.
For a small number of correctable errors (< 10/day), there was no action needed, or one uncorrectable, but that kind of failure is what drives people without ECC crazy; some of the machines that hit an uncorrectable only did it once and were fine. The other ones we'd replace ram for. A small number of daily errors or a single uncorrectable were less common than the ones that got their ram swapped. I don't know for sure if uncorrectables correlated with many correctable errors, because correctable errors were only reported hourly ... if it was a step change to bad ram, it's likely to halt before a reporting interval, so no report. Unless the correctables were several a second, the impact of corrections isn't obvious.
DDR5 has built in ECC too. Unfortunately, AFAIK there's no error reporting mechanism, so while it should reduce error rates, it likely increases error severity. Assuming no bitflips between the ram module and the cpu, ECC on the ram corrects any single bitflips, but multiple flips are uncorrectable and must pass through, so any incorrect value the cpu gets has multiple bitflips.
I love how capable these tiny N150 machines are. I've got one running Debian for my home media and backup solution and it's never stuttered. I'd be curious about exactly what machine they're testing with. I've got the Beelink ME mini running that media server. And I use a Beelink EQ14 as a kind of jump box to remote into my work desktop.
Would you mind sharing the Linux hardware platform security report ("fwupdmgr security") for those Beelink boxes, e.g. what is enabled/disabled by the OEM? N150 SoC supports Intel TXT, which was previously limited to $800+ vPro devices, but it requires BIOS support from OEMs like Beelink. Depending on HSI status, OSS coreboot might be feasible on some N150 boxes.
I'm not the author but my parents have pretty much decided they will never use a game console newer than the nintendo wii, but so far two of their wiis have died. Since no one is making wiis anymore, I decided to future-proof their gaming by setting them up with a mele quieter 4c [0], with the official wii bluetooth module attached over USB for perfect wiimote compatibility, running the dolphin emulator. Not every game runs perfectly, but every game they want to play runs perfectly AND it is smaller, silent, and consumes less power than the real wii.
[0] My experience with that mini computer: I bought two. The first one was great, but the 2nd one had coil whine so I had to return it. Aside from the whine, I love the box. If I could guarantee I wouldn't get whine I'd buy another today.
I didn't see a size of the test page as I went through (Did I miss it?) and I think in this case it potentially matters. A 2.5 gbps link can do ~280 MB/s, which at 63k requests is just 4.55KB a request. That could easily be a single page and saturating the connection link, explaining the clustering at that value.
This is related to a quad core intel processor. It must be noted that most of these OS with the exception of NetBSD can't efficiently handle heterogenous core configurations like in what you find on more powerful Intel processors.
The Minix 0db machines are great. The N150 model is $250 but it’s fanless which is a handy feature if you’re prone to dust, unable (or unwilling!) to clean filters, hate fans, or love chunky blocks of metal:
I have a pi5 running with the poe+ m2 hat from waveshare. I absolutely detest the boot loader shenanigans and the limited support from OS other than a select few. For comparison I also have a pi5 and after dumping a few files in a fat partition it had UEFI rolling and I just next->next->finished my favorite OS on it.
Not a lot of options for N100 with PoE+ though. There is the Radxa x4 but thats hard to find and the MS S100 is quite locked down in terms of storage.
I'd love to see benchmarks that hit CPU or NIC limits; the HTTPS test hit CPU limits on many of the configurations, but inquiring minds want to know how much can you crank out with FreeBSD. Anyway, overload behavior is sometimes very interesting (probably less so for static https). May well need more load generation nodes though; load generation is often harder than handling load.
OTOH, maybe this is a bad test on purpose? the blogger doesn't like running these tests, so do a bad one and hope someone else is baited into running a better test?
At least one reason could be that `sendfile` is useless when using HTTP/2 or HTTP/3, as you can no longer just dump the contents directly onto a socket. Whether that actually makes a practical difference on modern hardware remains to be seen of course.
Yes there is: HTTP/2's and HTTP/3's framing of messages is such that you can't reliably dump a file as-is onto an HTTP/2 connection, as it may exceed the maximum size allowed by a frame.
Imagine what a big piece of iron could do, it makes me think of the stories recently of people who came out of cloud and run everything of one or few bare metal hosts.
Mini PCs mostly run N-series Intel CPUs [0][1] nowadays AFAIK.
The cheaper and most popular one is N150 [2] which is a replacement for N100 [3]. The newer one boosts a bit higher. The 6-7W TDP in specs is a lie, but these CPUs still have fairly modest consumption working at about 10-20W on average.
There are some low power chips from AMD, but that's mostly NAS territory. Don't see them a whole lot and don't know much about them either.
I just ordered few days ago a AMD 6850U based minipc (still on it's way). 15 watts TDP, 8 zen3+ cores at 2.7-4.7 GHz. On paper very good fit for minipc. Obviously zen4/5 would be nicer, but those are more difficult to find.
Big reason why I wanted AMD is that Intel officially supports only 16GB RAM on these N series chips. Also AMD has 20 gen4 PCIe lanes vs 9 gen3 lanes for Intel.
N100/n150/n97 have similar performance. Power seems to be 6-12w at idle depending. Ram limited to 16GB usually. Low number of pcie lanes (NAS are limited). Cost used to be $100, but now it went up to $120+.
From amd side I have 4700u and 5700u, similar idle power (12w), similar cost ($200 with 32gb of ram, now more expensive). A lot more capable then n100, at a cost.
I use a whole bunch of mini pc in my lab, they are so much cheaper to run electricity wise (and cost)
There are also higher power AMD devices that work extremely well.
If you’re willing to go up to 60W TDP and $500-1000, then they’re good enough to run recent steam games under linux at 1080p and LLM inference (if you spring for > ~32GB of RAM).
Like many others on this thread, I’ve had good luck with beelink.
Some N150 systems have integrated LPDDR5 from Chinese memory suppliers, who have been increasing production capacity, unlike Korean memory suppliers who have decreased production and increased prices in the face of higher demand. More NAND supplier competition needed.
That is good news, but I have seem some sellers already jump their price +$100 on Amazon. Perhaps just price gouging to take advantage. I might pick up another if I can get it for ~$140.
I don't see any mention of enabling kTLS (TLS in the kernel). I'd suggest re-running the benchmark with kTLS enabled: https://www.f5.com/company/blog/nginx/improving-nginx-perfor...
Also it doesn't look like they enabled sendfile() in the nginx conf: https://nginx.org/en/docs/http/ngx_http_core_module.html#sen...
The combination of sendfile and kTLS should avoid round-trips to userland while sending files.
True, but the other OS's don't suppor that. If the goal is out of the box testing, kTLS would not be representative of that.
IMHO, it might be worthwhile for NGINX to default to sendfile+kTLS enabled where appropriate. Maybe the potential for negative experience is too high.
I know sendfile originally had some sharp edges, but I'm not sure how sharp it still is? You would need to use sendfile only for plain http or https with kTLS, and maybe that's too complex? Apache lists some issues [1] with sendfile and defaults to off as well; but I don't know how many sites are still serving 2GB+ files on Itanium. :P AFAIK, lighttpd added SSL_sendfile support on by default 3 years ago, and you can turn it off if you want.
I think there's also some complexity with kTLS on implementations of kTLS that limit protocol version and cipher choices, if it's on by choice it makes sense to refuse to operate with cipher selection and kTLS cipher availability that conflict, but if kTLS is on by default, you probably need to use traditional TLS for connections where the client selects a cipher that's not eligible for kTLS. Maybe that's extra code that nobody wants to write; maybe the inconsistency of performance depending on client cipher choice is unacceptable. But it seems like a worthwhile thing to me (but I didn't make a PR, did I?)
[1] https://httpd.apache.org/docs/2.4/mod/core.html#enablesendfi...
That makes no sense. Why would you not be testing with optimized hosting.
If one of the OSs has features that improve performance, why would you not include that in the comparison?
Just my two cents, as an end-user choosing a OS to use on an N150 to do static web hosting, I would sure like to know if those features make a meaningful difference.
But I also understand, that looking at that might have beyond the scope of the article.
Exactly. That's why I didn't enable it
But that said, it would be interesting to see the different systems after a tuning pass. Both as an example of capability, but also as an mechanic to discuss tuning options available to the users.
Mind, the whole "its slow get new hardware" comes from the fact that getting another 10% by tuning "won't fix the problem". By the time folks feel the sluggish performance, you're probably not looking for another 10 points. The 10 points matter at scale to lower overall hardware costs. 10% less hardware with a 1000 servers is a different problem with 10% less hardware with just one.
But, still, a tuning blog would be interesting, at least to me.
The numbers seems to be too much near 65535 to be a coincidence.
are you making the request from a single IP address source? are you aware of the limit of using the same source IP address for the same destination IP address ( and port )? ( each connection can have only a unique source address and source port to the destination, maxing out in source 65535 ports ) for the same destination
I wonder if that's why the cpu is idle for part of the time, it's waiting for sockets to become free.
I would expect http persistent connections (keep-alive) at these rates. It's very hard to get 64 k connections/second from a single IP to a single server ip:port without heavily tuning the client, which they don't mention doing. They're only testing for 10 seconds, but still, you'd need to clear all the closed connections out of TIME_WAIT pretty darn quick in order to re-use each port 10 times.
Sucks that that there's no ECC-RAM model. A phone-sized x86 slab, as opposed to those impractical mini-PC/Mini-Mac boxes, that one could carry around and connect to a powerbank of similar size, and/or various types of screens (including a smartphone itself), would make for a great ultramobile setup.
Odroid H4 family (H4, H4 Plus, H4 Ultra) supports in-band ECC, which supports one-bit error correction and two-bit error detection. And the 8-core model is just $220 (+case, +heatsink/fan, +shipping, but oh well)
Is the kernel support for those still awful or has it gotten better? Its been a long time since I had an odroid... C1 I think
The Odroid H4 is an amd65 board like the N150 NUC. So the kernel should be standard amd64.
If you want relatively small low-power box with ECC, checkout Asustor AS6804T. It is nominally a NAS but really you can use it for anything you want, it is just an x86-64 server with some disk bays. You also get nice 2x10GbE, which is rare with these minipcs
If it had a a few more cores, something like this would make for a great node in a distributed system like k8s or ceph for a homelab. At the asking price, however, one could also cross shop an HP micro server gen11.
Odroid H4 Ultra? It has 8 Gracemont cores that can stay boosted for quite a long time, and supports in-band ECC. 4x SATA too for those who care.
But the price of that is $1200, which is about 5 times the price of the average N150 mini PC.
Bring back the Intel Compute Stick? https://liliputing.com/this-cheap-intel-n150-mini-pc-is-smal...
Arm RK3399 SoC is blob free and some (Pinephone Pro, N4S, Chrome tablet) devices are small enough for sidecar usage.
How many times do you think ECC RAM has caught an error? Online anecdotes I've found indicate almost no one experiences regularly corrected errors that weren't due to imminently failing hardware.
I've managed a couple thousand servers with ECC. The vast majority had zero reported errors the whole life. Of those that reported errors, there were a few categories:
Some reported a couple errors a day for months (maybe years?) but worked fine.
Some ramped up error counts over hours or days.
Some went from zero to lots in one step.
A few managed to hit uncorrectable errors; sometimes just once.
For a small number of correctable errors (< 10/day), there was no action needed, or one uncorrectable, but that kind of failure is what drives people without ECC crazy; some of the machines that hit an uncorrectable only did it once and were fine. The other ones we'd replace ram for. A small number of daily errors or a single uncorrectable were less common than the ones that got their ram swapped. I don't know for sure if uncorrectables correlated with many correctable errors, because correctable errors were only reported hourly ... if it was a step change to bad ram, it's likely to halt before a reporting interval, so no report. Unless the correctables were several a second, the impact of corrections isn't obvious.
Fun fact: DDR6 contains built in ECC by default. RAM sizes are getting so large it's causing issues in the field and also issues with yields
So, the industry thinks its a problem.
DDR5 has built in ECC too. Unfortunately, AFAIK there's no error reporting mechanism, so while it should reduce error rates, it likely increases error severity. Assuming no bitflips between the ram module and the cpu, ECC on the ram corrects any single bitflips, but multiple flips are uncorrectable and must pass through, so any incorrect value the cpu gets has multiple bitflips.
In other words, the industry has gone to shit as usual, starting with rowhammer.
But my question still stands.
I like to pretend options without ECC simply do not exist. (i.e. as it should be)
It shortens the list of options, making choices much easier.
I love how capable these tiny N150 machines are. I've got one running Debian for my home media and backup solution and it's never stuttered. I'd be curious about exactly what machine they're testing with. I've got the Beelink ME mini running that media server. And I use a Beelink EQ14 as a kind of jump box to remote into my work desktop.
Would you mind sharing the Linux hardware platform security report ("fwupdmgr security") for those Beelink boxes, e.g. what is enabled/disabled by the OEM? N150 SoC supports Intel TXT, which was previously limited to $800+ vPro devices, but it requires BIOS support from OEMs like Beelink. Depending on HSI status, OSS coreboot might be feasible on some N150 boxes.
https://fwupd.github.io/libfwupdplugin/hsi.html
I'm not the author but my parents have pretty much decided they will never use a game console newer than the nintendo wii, but so far two of their wiis have died. Since no one is making wiis anymore, I decided to future-proof their gaming by setting them up with a mele quieter 4c [0], with the official wii bluetooth module attached over USB for perfect wiimote compatibility, running the dolphin emulator. Not every game runs perfectly, but every game they want to play runs perfectly AND it is smaller, silent, and consumes less power than the real wii.
[0] My experience with that mini computer: I bought two. The first one was great, but the 2nd one had coil whine so I had to return it. Aside from the whine, I love the box. If I could guarantee I wouldn't get whine I'd buy another today.
It's a Minisforum UN150P
HSI report on that box would be useful.
I didn't see a size of the test page as I went through (Did I miss it?) and I think in this case it potentially matters. A 2.5 gbps link can do ~280 MB/s, which at 63k requests is just 4.55KB a request. That could easily be a single page and saturating the connection link, explaining the clustering at that value.
This is related to a quad core intel processor. It must be noted that most of these OS with the exception of NetBSD can't efficiently handle heterogenous core configurations like in what you find on more powerful Intel processors.
Love this! I have been running a N150 with Debian 13 as my daily driver and super impressed! For ~$150 it packs a punch!
Could you recommend make/model? Quality seems variable at those price points.
The Minix 0db machines are great. The N150 model is $250 but it’s fanless which is a handy feature if you’re prone to dust, unable (or unwilling!) to clean filters, hate fans, or love chunky blocks of metal:
https://www.minix.com.hk/products/minix-z150-0db-fanless-min...
I bought my first one because it’s silent. I bought my second one because I like chunky blocks of metal.
The Topton/CWWK boxes are consistently decent. Best choice if you want fanless.
For mini pcs, Beelink probably has the best support. I've owned a few and had one replaced under warranty.
the N100 family has been the raspberry pi host killer for me, migrated to one from an rpi4, couldn't be happier.
Are you running a Radxa x4 or something else?
No, I’ve got something much bigger than the rpi form factor, but still very small in absolute terms, it isn’t a beelink, but something quite similar.
I have a pi5 running with the poe+ m2 hat from waveshare. I absolutely detest the boot loader shenanigans and the limited support from OS other than a select few. For comparison I also have a pi5 and after dumping a few files in a fat partition it had UEFI rolling and I just next->next->finished my favorite OS on it.
Not a lot of options for N100 with PoE+ though. There is the Radxa x4 but thats hard to find and the MS S100 is quite locked down in terms of storage.
Do you have GPIOs?
Everything that needs gpio is attached to esp32s all over, so technically no, not on the N100 box.
im sure one of my beelink n95 boxes has gpio
I'd love to see benchmarks that hit CPU or NIC limits; the HTTPS test hit CPU limits on many of the configurations, but inquiring minds want to know how much can you crank out with FreeBSD. Anyway, overload behavior is sometimes very interesting (probably less so for static https). May well need more load generation nodes though; load generation is often harder than handling load.
OTOH, maybe this is a bad test on purpose? the blogger doesn't like running these tests, so do a bad one and hope someone else is baited into running a better test?
All these benchmarking utilities like wrk are notorious for not supporting HTTP/2. Why would you serve static content and not use HTTP/2?
At least one reason could be that `sendfile` is useless when using HTTP/2 or HTTP/3, as you can no longer just dump the contents directly onto a socket. Whether that actually makes a practical difference on modern hardware remains to be seen of course.
There is nothing that prevents you from using sendfile and HTTP/2 at the same time. You still dump the contents directly into the socket.
Yes there is: HTTP/2's and HTTP/3's framing of messages is such that you can't reliably dump a file as-is onto an HTTP/2 connection, as it may exceed the maximum size allowed by a frame.
Imagine what a big piece of iron could do, it makes me think of the stories recently of people who came out of cloud and run everything of one or few bare metal hosts.
That's the point!
Is there a guide somewhere to what low power CPUs exist in these new mini PC things? I feel like I'm increasingly out of touch.
Mini PCs mostly run N-series Intel CPUs [0][1] nowadays AFAIK.
The cheaper and most popular one is N150 [2] which is a replacement for N100 [3]. The newer one boosts a bit higher. The 6-7W TDP in specs is a lie, but these CPUs still have fairly modest consumption working at about 10-20W on average.
There are some low power chips from AMD, but that's mostly NAS territory. Don't see them a whole lot and don't know much about them either.
[0] https://www.techpowerup.com/cpu-specs/?f=codename_=Gracemont
[1] https://www.techpowerup.com/cpu-specs/?f=codename_=Twin%20La...
[2] https://www.techpowerup.com/cpu-specs/processor-n150.c4109
[3] https://www.techpowerup.com/cpu-specs/processor-n100.c3007
I just ordered few days ago a AMD 6850U based minipc (still on it's way). 15 watts TDP, 8 zen3+ cores at 2.7-4.7 GHz. On paper very good fit for minipc. Obviously zen4/5 would be nicer, but those are more difficult to find.
Big reason why I wanted AMD is that Intel officially supports only 16GB RAM on these N series chips. Also AMD has 20 gen4 PCIe lanes vs 9 gen3 lanes for Intel.
https://www.techpowerup.com/cpu-specs/ryzen-7-pro-6850u.c276...
N100/n150/n97 have similar performance. Power seems to be 6-12w at idle depending. Ram limited to 16GB usually. Low number of pcie lanes (NAS are limited). Cost used to be $100, but now it went up to $120+.
From amd side I have 4700u and 5700u, similar idle power (12w), similar cost ($200 with 32gb of ram, now more expensive). A lot more capable then n100, at a cost.
I use a whole bunch of mini pc in my lab, they are so much cheaper to run electricity wise (and cost)
While the N100 document a 16gb limit, they are known to have no problems with a 32gb module. I run one myself.
There are also higher power AMD devices that work extremely well.
If you’re willing to go up to 60W TDP and $500-1000, then they’re good enough to run recent steam games under linux at 1080p and LLM inference (if you spring for > ~32GB of RAM).
Like many others on this thread, I’ve had good luck with beelink.
Love these N150 systems. I wonder if the RAM/SSD/misc shortages are going to make these humble $140 boxes like $300+ soon.
Some N150 systems have integrated LPDDR5 from Chinese memory suppliers, who have been increasing production capacity, unlike Korean memory suppliers who have decreased production and increased prices in the face of higher demand. More NAND supplier competition needed.
That is good news, but I have seem some sellers already jump their price +$100 on Amazon. Perhaps just price gouging to take advantage. I might pick up another if I can get it for ~$140.
I'd really like one that has 2x M.2 slots. I'm very uncomfortable running a server on a single disk.
Also, ECC ram would be nice.
2x m.2 is usually reserved for more expensive (>$200) mini pc. Or nas based mini pc which have trade offs.
Ecc ram is rare because very few people are asking for it, and it costs extra
If you don't need speed, you can bifurcate one 4-lane M.2 into 4x 1-lane M.2 slots.
It really should be "nginx static web hosting..." as it seems to be very specifically measuring nginx performance across OSs.
Otherwise, seL4/LionsOS webserver scenario could be tested.