SMS delivery is not deterministic: routing defines behavior

(blog.bridgexapi.io)

2 points | by Bridgexapi 9 hours ago ago

8 comments

After working on SMS delivery systems, one thing stood out:

Most APIs expose sending, but hide execution.

You submit a message and get "delivered" back, but everything that actually determines behavior is hidden: routing, timing, execution path.

This becomes a real problem in systems like OTP, fraud alerts or transactional messaging, where timing matters more than delivery itself.

The same request can behave differently across executions, but without visibility into routing, you can’t explain or control it.

This post breaks down why that happens and what control actually means at the infrastructure level.

Bridgexapi 9 hours ago

One thing I’m still trying to understand better:

How do people debug delivery issues today when timing is inconsistent?

Most tools seem to expose status, but not execution.

Curious how others approach this in production systems.

panny 9 hours ago

I spend a lot of time outside the country. Everyone seems to block VoIP for "security" now too. So please stop using SMS for 2FA. Seriously, even NIST says this is a bad idea, but everyone keeps doing it.

https://www.schneier.com/blog/archives/2016/08/nist_is_no_lo...

I have yubikeys. Lots of people can do authenticator codes on their phones. Stop using SMS already. As you're discovering, it's garbage for that purpose, even for you as the developer. Even emailing the code would be better.

[-]

Bridgexapi 9 hours ago

That’s fair, I agree SMS isn’t great for 2FA.

I think what’s interesting is that a lot of systems still rely on it anyway (reach, fallback, onboarding), but treat it like it’s deterministic.

In practice, I’ve seen more issues from unpredictability than just security — timing, routing behavior, stuff you can’t really see.

So even if teams accept the trade-offs, they still don’t really understand how delivery behaves.

Have you seen similar issues in systems that still use SMS as fallback?

[-]

panny 9 hours ago

I sent 100,000 SMS appointment reminders every day for over a decade. I resisted lead times under 1 day for a very long time, until I was forced to roll out lead times as short as one hour. I made extra sure, I got it in writing, that customers would be informed hourly lead time may fail and be unrecoverable. Don't depend on it.

Hourly, the shortest lead time was one hour. And you're talking about 45 seconds.

[-]

Bridgexapi 9 hours ago

Yeah that makes sense, especially at that scale.

What you’re describing is kind of what I keep running into too — people don’t really try to understand delivery, they just design around the fact that it’s unreliable.

Longer lead times, retries, fallback channels, etc.

Which works, but it also means the actual behavior stays a black box.

Did you ever notice differences between providers or routes, or was it basically opaque the whole time?

[-]

panny 8 hours ago

>they just design around the fact that it’s unreliable

Everything is unreliable. Always design for unreliable. That's my major gripe with most docs and tutorials from any of these services. They only describe what happens during success. They never go into detail on what happens when things go wrong. Your only option is to wait for it to blow up and learn from experience. In the meantime, assume it is going to blow up, try to catch it and log as much of the blowup as possible. Never assume it will work. Assume it won't work and be happy when it does.

This is why you should lean toward something like an authenticator. You can control the whole experience. Rely on unreliable services as little as you can.

[-]

Bridgexapi 8 hours ago

[dead]