Windows Kills SMB Speeds When Using Tailscale

(danthesalmon.com)

100 points | by salmon 15 hours ago ago

38 comments

  • luma 13 hours ago

    This is framed as a problem with windows, when it’s clearly a problem with tailscale misreporting its capabilities to the OS. If I have a 100gbit and a 1gbit interface, it’s perfectly reasonable for the OS to auto assign route metrics to prefer the much faster interface.

    This is the OS working as designed, switching to Linux won’t help. Tailscale needs to do a better job reporting link characteristics.

    • windexh8er 13 hours ago

      This actually would have nothing to do with the problem (advertised link speed) if Tailscale had more fine grained control over how SR routes are distributed. Currently it's all or nothing and there's no way to ignore specific routes if you're local to that network. Route length trumps network metrics, so it would be better handled in that manner.

      In almost every OS I've seen interface metrics will only be used for equal cost route lengths.

    • 13 hours ago
      [deleted]
  • windexh8er 13 hours ago

    This isn't a Windows problem. The OP would experience the same problem on Linux. I've run into this with SRs. I believe I may have even opened an issue with Tailscale to detect when a client is local to an exit and/or provide more fine grained route ingestion depending on where the client is with respect to the SR.

    But... Again, not a Windows problem. It is easy to fix by just advertising a longer route. But that implies you won't clobber other things. By default a more specific route will be chosen so a longer route advertised on the TS interface won't be selected.

  • muststopmyths 13 hours ago

    So, a virtual adapter advertises 100Gbps link speed, but is not capable of delivering that and the takeaway is "Windows kills..." ?

    How do other OSes handle the situation of having two interfaces with identical routes to a given destination ?

    I don't see a better solution than using link speed, but I haven't thought about it too deeply.

    • windexh8er 12 hours ago

      Tailscale shouldn't advertise a route that is local to the machine. This is a routing loop. The way SR route distribution works in Tailscale is that you accept all routes or nothing. Routing platforms have the concept of route filters to prevent accepting an advertised route that would create a loop.

      There are hacky ways around this without having to deal with metrics (just advertise a /23 instead of a /24 and the /24 will be selected by default). But if you've got contiguous subnets you may not be able to clobber the additional address space just to avoid the route.

      • RockRobotRock 12 hours ago

        I really thought Tailscale would automagically figure this out. If this were true in all cases, my internet would not work at all since it would try to reach my router through the Tailscale interface.

        It's odd.

        • windexh8er 4 hours ago

          The SR can't route for your gateway or else Tailscale itself would break it's own connection. A gateway IP isn't treated the same as a subnet route.

        • readyplayeremma 9 hours ago

          You don’t have any specific routes to random internet addresses though. And Tailscale would not either. Unless your Windows server is running BGP, all your Internet traffic is hitting the default route.

  • insaneirish 12 hours ago

    I feel like this whole thing buries the lede a bit.

    Yes, turns out running overlay/VPN type things disrupts traffic patterns. This is a non-story.

    But we're talking about using wireguard on a local network, so the actual interesting question is: why does it cause the performance to plummet? Is it an implementation issue or something more fundamental?

    I expect some performance impact. I don't expect a three orders of magnitude impact (which is what 355 KB/s imputes).

    • thowawatp302 9 hours ago

      It’s TCP, so bandwidth-delay product, if the hairpin that gets the traffic back to the local lan does anything non-trivial.

    • EVa5I7bHFq9mnYK 11 hours ago

      I check the "Allow local network access" in Exit Nodes, then it transfers at max speed over local Ethernet.

      • windexh8er 3 hours ago

        This doesn't actually affect anything if you're accepting Tailscale SRs. The conflict the article states is accepting a route advertised by Tailscale for their local network (the SR route) while on the local network (same network as the SR route). This forces all traffic through the wireguard interface, then it's routed to the SR and then back out because the interface metric is better than the hardware because of the link speed advertised. This is the root of the bandwidth issue.

        The "Allow local network access" is an IP filter that's put into place or not.

  • bGl2YW5j 12 hours ago

    Thanks to the author for this!

    What oddly coincidental timing ... I finished setup of Tailscale just yesterday and ran into this exact issue when testing it. I didn't think too much of it and blamed the USB connection I'm using to connect my external drive.

  • accrual 13 hours ago

    It was nice to see PowerShell could change the interface metric when the adapter GUI refused due to the empty IP field. I bet that check has been there since the 90s.

    It makes me a little happy when a new CLI is able to do something the old GUI cannot!

    • ygra 9 hours ago

      It's not just the new CLI. I guess you could have done the same with netsh for ages as well.

  • magicalhippo 12 hours ago

    I have my desktop PC connected to my TrueNAS box via both regular 1GbE via switch and a direct 10GbE link. I experienced similar issues where sometimes Windows would pick the sub-optimal interface.

    I decided to brute force it, by editing my hosts file on Windows and adding a custom entry for the static IP assigned to the 10GbE adapter in TrueNAS. So if my NAS was named "mynas" I'd add a "mynas10" entry in hosts file.

  • caconym_ 13 hours ago

    If Tailscale is being used for remote access to the author's LAN, why is it running on a desktop that's always physically connected to the LAN? I have a similar setup for remote access but using Wireguard instead; my main router (pfSense VM running on Proxmox like the author's thing) handles the tunnels and routing for the remote subnet(s), and it all Just Works. Only the devices that actually get used remotely need to be set up as Wireguard peers, and they're configured to disconnect from the tunnel when they're on my home wifi. IIUC Wireguard automatically does the setup/teardown of routes on those peers when it's toggled on/off.

    • RockRobotRock 12 hours ago

      >If Tailscale is being used for remote access to the author's LAN, why is it running on a desktop that's always physically connected to the LAN?

      Because it's probably not only used for that. Personally, I want to access my local network segment from anywhere, and at the same time SSH into a cloud box without exposing port 22 to the internet.

      Tailscale does the second one really well. I've also had problems with route loops which is why I've avoided the subnet router feature.

      • caconym_ 9 hours ago

        > Because it's probably not only used for that. Personally, I want to access my local network segment from anywhere, and at the same time SSH into a cloud box without exposing port 22 to the internet.

        In my Wireguard-based setup there is no difference between the former and the latter. Remote peers connect to my router via a single open Wireguard port and then routing goes both ways—remote to LAN, LAN to remote, and also remote to remote via my router. Machines on the LAN have routes to any other LAN or remote machine without needing multiple interfaces or any local VPN configuration.

        For some people Tailscale's features will be game changers (NAT hole punching, automatic DNS for all tailnet clients across multiple subnets, etc.) but I'm afraid OP may be using Tailscale as a crutch rather than getting his router sorted out properly, and the result is this weird redundancy of core network functions covering the same set of machines.

        It's not even really a Tailscale problem per se, though I guess if you have machines naively connected to a Tailscale "subnet router" analogous to how my network is set up, you may not be able to take advantage of the full Tailscale feature set.

    • jeroenhd 11 hours ago

      > If Tailscale is being used for remote access to the author's LAN, why is it running on a desktop that's always physically connected to the LAN?

      Tailscale has a few nice additional features as well, like automatic DNS assignment for hosts on the virtual network, generation of HTTPS certificates for those hosts, and, if you enable the right middleware in your locally run services, transparent authentication to web servers for computers on the network. If you're going all-in on Tailscale, you can use it to automate a lot of network management. That would require you to run Tailscale on all of your devices, though.

    • stego-tech 12 hours ago

      Because, for whatever reason I’ve yet to grasp, homelab folks like to implement Tailscale as some sort of “secure virtual network” abstraction layer - think something similar to zScaler ZPA - on top of their local LAN. To be fair, I didn’t think Tailscale did a good job explaining why this isn’t a great idea last time I tinkered with it in 2022.

      If you can juggle SSH keys and forward ports on your firewall, you can just run plain old Wireguard. Don’t use Tailscale as a network abstractor unless you know what and why you’re using it that way for.

      • jauer 12 hours ago

        > Because, for whatever reason I’ve yet to grasp, homelab folks like to implement Tailscale as some sort of “secure virtual network” abstraction layer - think something similar to zScaler ZPA - on top of their local LAN.

        This is Tailscale's intended behavior, not a matter of how homelab folks like to implement it: https://github.com/tailscale/tailscale/issues/659#issuecomme...

        • RockRobotRock 12 hours ago

          Maybe I'm not understanding properly, but why can't my device ARP ping and handshake with the subnet router to determine that I'm on the local subnet and to stop routing it through Tailscale?

          • jauer 11 hours ago

            Tailscale intentionally overrides your device's routing table to force traffic between hosts in the same subnet to go over a Wireguard tunnel instead of bypassing it. They do this because they believe that the presumption that a local subnet is trustworthy is false.

          • lmm 12 hours ago

            It could, but the Tailscale devs don't consider "silently start leaking traffic to anyone on the local subnet" to be a desirable feature.

      • code_biologist 12 hours ago

        I needed access to my home NAS and linux GPU box while visiting family last year over the holidays. I was in a rush. I spent 45 minutes trying to get Wireguard configured and working, then tried Tailscale and had the network I was looking for in 15 minutes. I'm not a homelabber. I hate network admin.

        Is Just Works™ / being moron-resistant, with good first-party client apps, a bad reason to pick Tailscale?

      • 12 hours ago
        [deleted]
  • Animats 11 hours ago

    Ah, non-transparent middlebox trouble.

  • wtcactus 6 hours ago

    Does this also happen in Zerotier?

    Don't take me wrong, I think tailscale is absolutely great, I'm just interested in trying Zerotier for a while since it has integration with OPNSense (in the GUI, I know tailscale works fine if you install the package and configure it manually).

  • hk1337 13 hours ago

    I don’t think this is exclusive to Windows. SMB is a crappy service for anything outside local LAN. I am not too familiar with Tailscale but from what I understand, it’s basically akin to a VPN.

    • not_a_bot_4sho 12 hours ago

      Article spoiler: the issue is Tailscale. SMB and Windows are red herrings.

      • EVa5I7bHFq9mnYK 11 hours ago

        True. Have exactly same speed issues with SCP as with SMB. It also depends on exit node used - some exit nodes give 10MB/s speed, same give 1MB/s. Doesn't work without exit nodes at all - cross-border blocking issues.

    • dawnerd 11 hours ago

      Sounds like ya it’s a different issue here but I can confirm using WireGuard absolutely destroys smb performance. It’s not as bad on windows but on Mac it’s basically unusable.

    • karlgkk 11 hours ago

      “I don’t know what I’m talking about but here’s my opinion.”

      Thanks for your contribution

    • mixdup 13 hours ago

      Did you read the linked post? It actually has nothing to do with SMB

  • leshokunin 10 hours ago

    I’ve been curious as to why SMB seems to get little attention, and NFS even less. I had to go through hoops to even get NFS working at all on Windows.

    I treated myself to 10GbE a while ago, and it feels like the protocol side of this is something that just gets overlooked. Unclear why. Maybe people just assume once it works, it works?