At a past job (hedge fund), my role was to co-ordinate investigations into why latency may have changed when sending orders.
A couple of quants had built a random forest regression model that could take inputs like time of day, exchange, order volume etc and spit out an interval of what latency had historically been in that range.
If the latency moved outside that range, an alert would fire and then I would co-ordinate a response with the a variety of teams e.g. trading, networking, Linux etc
If we excluded changes on our side as the culprit, we would reach out to the exchange and talk to our sales rep there would might also pull in networking etc.
Some exchanges, EUREX comes to mind, were phenomenal at helping us identify issues. e.g. they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased.
One day, it's IEX, of Flash Boys fame, that triggers an alert. Nothing changed on our side so we call them. We are going back and forth with the networking engineer and then the sales rep says, in almost hushed tones:
"Look, I've worked at other exchange so I get where you are coming from in asking these questions. Problem is, b/c of our founding ethos, we are actually not allowed to track our own internal latency so we really can't help you identify the root cause. I REALLY wish it was different."
I love this story b/c HN, as a technology focused site, often thinks all problems have technical solutions but sometimes it's actually a people or process solution.
Also, incentives and "philosophy of the founders" matter a lot too.
What kind of founding ethos doesn't allow tracking internal latency? Is their founding ethos "Never Admit Responsibility?"; "Never Leave A Paper Trail?"
This company's official ethical foundation is "Don't Get Caught."
From the wiki about IEX: "It was founded in 2012 in order to mitigate the effects of high-frequency trading." I can see how they don't want to track internal latency as part of that, or at least not share those numbers with outsiders. That just encourages high frequency traders again.
One would hope for a more technical solution to HFT than willful ignorance lol. For example, they could batch up orders every second and randomize them.
Curious what your actual role was -- sounds very interesting! Project manager? Dev? Operations specialist? E.g. were you hired into this role, and what were the requisites?
I'm a front office engineer at a prop firm -- always interesting to get insight into how others do it.
We have fairly similar parallels, maybe with the exception of throwing new exchange connections to the dedicated networking group.
Always love watching their incident responses from afar (usually while getting impacted desks to put away the pitchforks). Great examples of crisis management, effectiveness and prioritization under pressure, ... All while being extremely pragmatic about actual vs perceived risk.
(I'm sure joining KCG in August of 2012 was a wild time...)
It's definitely a job that you don't hear much about but has a lot of interesting positives for people who like technology and trading. Especially if you prefer shorter term, high intensity work vs long term projects (e.g. like developers).
> Always love watching their incident responses from afar
May I know if someone with no trading knowledge can get into this field? Or do new hires that you've seen generally have some background knowledge on related to trading, etc.?
I did consider applying for a role in a very similar field, but figured I'll be fighting an uphill battle with no knowledge in trading/stock market/etc.
You know how coders are expected to grind out leetcode interviews? For the finance fields, a common interview topic is what you read in “The Journal” (WSJ). So just stay on top of it for a few weeks, see some trends, etc.
> they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased
That was not why. Possibly the cable made a difference (had an open circuit that made the NICs back down to a lower speed; noisy leading to retransmissions) but it wasn't the length per se.
> Pipelined replication: the sequencer assigns a sequence number immediately and ships the event to replicas in parallel. Matching doesn't wait for the replicas to acknowledge.
How is this avoiding data loss if the lead sequencer goes down after acking but without the replica receiving the write?
This is interesting but also just hilarious at a meta level. I was a “low frequency” ie manual fundamental based hedge fund investor for many years. In general I think hft is a net benefit to liquidity when done in compliance with the text and spirit of regulations. But no real world allocation of resources is improved by having to game transactions to this level of time granularity. This is just society pouring resources down a zero sum black hole. Open to hearing contrary views of course.
I've been wondering if the stock market would be more efficient if trades executed only every <small time interval> instead of continuously, i.e. every 1 second an opening trade style cross book clearance happens. Orders would have to be on the book for a full interval to execute to prevent last millisecond rushes at the end of an interval
I'm probably missing some second order effects but it feels like this would mitigate the need for race to the bottom latencies and would also provide protection against fat fingered executions in that every trading algorithm would have a full second to arbitrage it
You could do this but the cost would be wider bid/ask spreads for all market participants. If you make it harder for market makers to hedge their position, they will collect a larger spread to account for that. A whole lot of liquidity can disappear in a second when news hits.
I’d rather have penny-wide spreads on SPY than restrict trading speed for HFTs. Providing liquidity is beneficial to everyone, even if insane amounts of money are spent by HFTs to gain an edge.
I wish the article had stuck with the technical topic at hand and left out the embellishment. In particular the opening piece talking about what is happening outside the exchange.
What happens outside the exchange really doesn’t matter. The ordering will not happen until it hits the exchange.
And that is why algorithmic traders want their algos in a closet as close to the exchange both physically and also in terms of network hops as possible.
This article both undersells and oversells the technical challenge exchanges solve.
First, it is of course possible to apply horizontal scaling through sharding. My order on Tesla doesn't affect your order on Apple, so it's possible to run each product on its own matching engine, its own set of gateways, etc. Most exchanges don't go this far: they might have one cluster for stocks starting A-E, etc. So they don't even exhaust the benefits available from horizontal scaling, partly because this would be expensive.
On the other hand, it's not just the sequencer that has to process all these events in strict order - which might make you think it's just a matter of returning a single increasing sequence number for every request. The matching engine which sits downstream of the sequencer also has to consume all the events and apply a much more complicated algorithm: the matching algorithm described in the article as "a pure function of the log".
Components outside of that can generally be scaled more easily: for example, a gateway cares only about activity on the orders it originally received.
The article is largely correct that separating the sequencer from the matching engine allows you to recover if the latter crashes. But this may only be a theoretical benefit. Replaying and reprocessing a day's worth of messages takes a substantial fraction of the day, because the system is already operating close to its capacity. And after it crashed, you still need to figure out which customers think they got their orders executed, and allow them to cancel outstanding orders.
Once sequencing is done, the matching algorithm can run with some parallelism.
For example, Order A and order B might interact with eachother... but they also might not. If we assume they do not, we can have them processed totally independently and in parallel, and then only if we later determine they should have interacted with each other then we throw away the results and reprocess.
It is very similar to the way speculative execution happens in CPU's. Assume something then throw away the results if your assumption was wrong.
Off the cuff, id expect this leads to less improvement than you might think. The vast majority of orders, especially orders arriving in sequence close to one another, are likely on a small set of extremely liquid symbols, and usually all for prices at or near the top of the book for those symbols.
Happy to discuss more, might be off the mark... these optimizations are always very interesting in their theoretical vs actual perf impact.
in high scale stateless app services this approach is typically used to lower tail latency. two identical service instances will be sent the same request and whichever one returns faster “wins” which protects you from a bad instance or even one which happens to be heavily loaded.
> Every modern exchange has a single logical sequencer. No matter how many gateways feed the system, all events flow into one component whose job is to assign the next sequence number. That integer defines the global timeline.
A notable edge case here is that if EVERYTHING (e.g. market data AND orders) goes through the sequencer then you can, essentially, Denial of Service to key parts of the trading flow.
e.g. one of the first exchanges to switch to a sequencer model was famous for having big market data bursts and then huge order entry delays b/c each order got stuck in the sequencer queue. In other words, the queue would be 99.99% market data with orders sprinkled in randomly.
B/c, by design, you want the archived stream of events to include everything.
e.g. a lot of these systems have a "replay" node that can be used by components that just restarted. You want the replay to include ALL of the messages seen so you can rebuild the state at any given point.
(There are, of course, tradeoffs to this so I'm just commenting on the "single sequencer" design philosophy)
How long can the exchanges scale their sequencer systems (which are sequential) vertically? The trading volume is only rising with time at a higher rate than the advancement of low latency tech.
The title is obviously the wrong way around, exchanges turn distributed logs into order books. The distributed part is a resilience decision but not essential to the design (technically writing to a disk would give persistence with less ability to recover, or with some potential gaps in the case of failure (remember there is a sequence published on the other end too, the market data feed)). As noted in the article, the sequencer is a single-threaded, not parallelisable process. Distribution is just a configuration of that single threaded path. Parallelisation is feasible to some extent by sharding across order books themselves (dependencies between books may complicate this).
It would not surprise me at all if the sequencing step was done via FPGA processing many network inputs at line rate with a shared monotonic clock. This would give it some amount of parallelism.
good point, sequencing is very minimal, therefore some parallelism is feasible that way, but the pipeline is not that deep, at least ideally. Of course if people are chasing nano-seconds, it may make sense.
The incessant bullet lists and the conclusion titled "Conclusion" give it away. And above all, the complete lack of "voice". You can tell when a human is speaking and when a sanitised amorphous blob of averageness is speaking.
At a past job (hedge fund), my role was to co-ordinate investigations into why latency may have changed when sending orders.
A couple of quants had built a random forest regression model that could take inputs like time of day, exchange, order volume etc and spit out an interval of what latency had historically been in that range.
If the latency moved outside that range, an alert would fire and then I would co-ordinate a response with the a variety of teams e.g. trading, networking, Linux etc
If we excluded changes on our side as the culprit, we would reach out to the exchange and talk to our sales rep there would might also pull in networking etc.
Some exchanges, EUREX comes to mind, were phenomenal at helping us identify issues. e.g. they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased.
One day, it's IEX, of Flash Boys fame, that triggers an alert. Nothing changed on our side so we call them. We are going back and forth with the networking engineer and then the sales rep says, in almost hushed tones:
"Look, I've worked at other exchange so I get where you are coming from in asking these questions. Problem is, b/c of our founding ethos, we are actually not allowed to track our own internal latency so we really can't help you identify the root cause. I REALLY wish it was different."
I love this story b/c HN, as a technology focused site, often thinks all problems have technical solutions but sometimes it's actually a people or process solution.
Also, incentives and "philosophy of the founders" matter a lot too.
What kind of founding ethos doesn't allow tracking internal latency? Is their founding ethos "Never Admit Responsibility?"; "Never Leave A Paper Trail?"
This company's official ethical foundation is "Don't Get Caught."
From the wiki about IEX: "It was founded in 2012 in order to mitigate the effects of high-frequency trading." I can see how they don't want to track internal latency as part of that, or at least not share those numbers with outsiders. That just encourages high frequency traders again.
One would hope for a more technical solution to HFT than willful ignorance lol. For example, they could batch up orders every second and randomize them.
Curious what your actual role was -- sounds very interesting! Project manager? Dev? Operations specialist? E.g. were you hired into this role, and what were the requisites?
I was what was called "Trade Desk".
Many firms have them and they are a hybrid of:
- DevOps (e.g. we help, or own, deployments to production)
- SRE (e.g. we own the dashboards that monitored trading and would manage outages etc)
- Trading Operations (e.g. we would work with exchanges to set up connections, cancel orders etc)
My background is:
- CompSci/Economics BA
- MBA
- ~20 years of basically doing the above roles. I started supporting an in house Order Management System at a large bank and then went from there.
For more detail, here is my LinkedIn: https://www.linkedin.com/in/alex-elliott-3210352/
I also have a thread about the types of outages you see in this line of work here: https://x.com/alexpotato/status/1215876962809339904?s=20
(I have a lot of other trading/SRE related threads here: https://x.com/alexpotato/status/1212223167944478720?s=20)
Thanks for all the info!
I'm a front office engineer at a prop firm -- always interesting to get insight into how others do it.
We have fairly similar parallels, maybe with the exception of throwing new exchange connections to the dedicated networking group.
Always love watching their incident responses from afar (usually while getting impacted desks to put away the pitchforks). Great examples of crisis management, effectiveness and prioritization under pressure, ... All while being extremely pragmatic about actual vs perceived risk.
(I'm sure joining KCG in August of 2012 was a wild time...)
You are very welcome!
It's definitely a job that you don't hear much about but has a lot of interesting positives for people who like technology and trading. Especially if you prefer shorter term, high intensity work vs long term projects (e.g. like developers).
> Always love watching their incident responses from afar
I actually have a thread on that too: https://x.com/alexpotato/status/1227335960788160513?s=20
> (I'm sure joining KCG in August of 2012 was a wild time...)
And not surprisingly, a thread on that as well: https://x.com/alexpotato/status/1501174282969305093?s=20
May I know if someone with no trading knowledge can get into this field? Or do new hires that you've seen generally have some background knowledge on related to trading, etc.?
I did consider applying for a role in a very similar field, but figured I'll be fighting an uphill battle with no knowledge in trading/stock market/etc.
You know how coders are expected to grind out leetcode interviews? For the finance fields, a common interview topic is what you read in “The Journal” (WSJ). So just stay on top of it for a few weeks, see some trends, etc.
> they once swapped out a cable that was a few feet longer than the older cable and that's why the latency increased
That was not why. Possibly the cable made a difference (had an open circuit that made the NICs back down to a lower speed; noisy leading to retransmissions) but it wasn't the length per se.
All technical problems are people problems
> Pipelined replication: the sequencer assigns a sequence number immediately and ships the event to replicas in parallel. Matching doesn't wait for the replicas to acknowledge.
How is this avoiding data loss if the lead sequencer goes down after acking but without the replica receiving the write?
This is interesting but also just hilarious at a meta level. I was a “low frequency” ie manual fundamental based hedge fund investor for many years. In general I think hft is a net benefit to liquidity when done in compliance with the text and spirit of regulations. But no real world allocation of resources is improved by having to game transactions to this level of time granularity. This is just society pouring resources down a zero sum black hole. Open to hearing contrary views of course.
I've been wondering if the stock market would be more efficient if trades executed only every <small time interval> instead of continuously, i.e. every 1 second an opening trade style cross book clearance happens. Orders would have to be on the book for a full interval to execute to prevent last millisecond rushes at the end of an interval
I'm probably missing some second order effects but it feels like this would mitigate the need for race to the bottom latencies and would also provide protection against fat fingered executions in that every trading algorithm would have a full second to arbitrage it
this is exactly what many dark pools do
"continuous periodic auctions"
You could do this but the cost would be wider bid/ask spreads for all market participants. If you make it harder for market makers to hedge their position, they will collect a larger spread to account for that. A whole lot of liquidity can disappear in a second when news hits.
I’d rather have penny-wide spreads on SPY than restrict trading speed for HFTs. Providing liquidity is beneficial to everyone, even if insane amounts of money are spent by HFTs to gain an edge.
I wish the article had stuck with the technical topic at hand and left out the embellishment. In particular the opening piece talking about what is happening outside the exchange.
What happens outside the exchange really doesn’t matter. The ordering will not happen until it hits the exchange.
And that is why algorithmic traders want their algos in a closet as close to the exchange both physically and also in terms of network hops as possible.
The embellishment is because it's written at least partly by an LLM.
This article both undersells and oversells the technical challenge exchanges solve.
First, it is of course possible to apply horizontal scaling through sharding. My order on Tesla doesn't affect your order on Apple, so it's possible to run each product on its own matching engine, its own set of gateways, etc. Most exchanges don't go this far: they might have one cluster for stocks starting A-E, etc. So they don't even exhaust the benefits available from horizontal scaling, partly because this would be expensive.
On the other hand, it's not just the sequencer that has to process all these events in strict order - which might make you think it's just a matter of returning a single increasing sequence number for every request. The matching engine which sits downstream of the sequencer also has to consume all the events and apply a much more complicated algorithm: the matching algorithm described in the article as "a pure function of the log".
Components outside of that can generally be scaled more easily: for example, a gateway cares only about activity on the orders it originally received.
The article is largely correct that separating the sequencer from the matching engine allows you to recover if the latter crashes. But this may only be a theoretical benefit. Replaying and reprocessing a day's worth of messages takes a substantial fraction of the day, because the system is already operating close to its capacity. And after it crashed, you still need to figure out which customers think they got their orders executed, and allow them to cancel outstanding orders.
> My order on Tesla doesn't affect your order on Apple
not necessarily
many exchanges allow orders into one instrument to match on another
(very, very common on derivatives exchanges)
Once sequencing is done, the matching algorithm can run with some parallelism.
For example, Order A and order B might interact with eachother... but they also might not. If we assume they do not, we can have them processed totally independently and in parallel, and then only if we later determine they should have interacted with each other then we throw away the results and reprocess.
It is very similar to the way speculative execution happens in CPU's. Assume something then throw away the results if your assumption was wrong.
Off the cuff, id expect this leads to less improvement than you might think. The vast majority of orders, especially orders arriving in sequence close to one another, are likely on a small set of extremely liquid symbols, and usually all for prices at or near the top of the book for those symbols.
Happy to discuss more, might be off the mark... these optimizations are always very interesting in their theoretical vs actual perf impact.
in high scale stateless app services this approach is typically used to lower tail latency. two identical service instances will be sent the same request and whichever one returns faster “wins” which protects you from a bad instance or even one which happens to be heavily loaded.
> Every modern exchange has a single logical sequencer. No matter how many gateways feed the system, all events flow into one component whose job is to assign the next sequence number. That integer defines the global timeline.
A notable edge case here is that if EVERYTHING (e.g. market data AND orders) goes through the sequencer then you can, essentially, Denial of Service to key parts of the trading flow.
e.g. one of the first exchanges to switch to a sequencer model was famous for having big market data bursts and then huge order entry delays b/c each order got stuck in the sequencer queue. In other words, the queue would be 99.99% market data with orders sprinkled in randomly.
why would market data go through the sequenced stream on an exchange?
for an exchange: market data is a projection of the order book, an observer that sits on the stream but doesn't contribute to it
and client ports have rate limits
B/c, by design, you want the archived stream of events to include everything.
e.g. a lot of these systems have a "replay" node that can be used by components that just restarted. You want the replay to include ALL of the messages seen so you can rebuild the state at any given point.
(There are, of course, tradeoffs to this so I'm just commenting on the "single sequencer" design philosophy)
by definition: an exchange doesn't need any reference to outside market data
even for systems built on a sequencer which do (e.g. an OMS), the volume is too large
the usual strategy is for processes which require it, is to sample it from outside, then stamp it as a command to the sequencer
which maintains the invariants
(my background: I have been a developer on one of Mike Blum's original sequencers)
How long can the exchanges scale their sequencer systems (which are sequential) vertically? The trading volume is only rising with time at a higher rate than the advancement of low latency tech.
Always fun to read about HFT. If anyone wants to learn about the Order Book data structure you can find it in JS here:
https://github.com/rhodey/limit-order-book
https://www.npmjs.com/package/limit-order-book
The title is obviously the wrong way around, exchanges turn distributed logs into order books. The distributed part is a resilience decision but not essential to the design (technically writing to a disk would give persistence with less ability to recover, or with some potential gaps in the case of failure (remember there is a sequence published on the other end too, the market data feed)). As noted in the article, the sequencer is a single-threaded, not parallelisable process. Distribution is just a configuration of that single threaded path. Parallelisation is feasible to some extent by sharding across order books themselves (dependencies between books may complicate this).
It would not surprise me at all if the sequencing step was done via FPGA processing many network inputs at line rate with a shared monotonic clock. This would give it some amount of parallelism.
good point, sequencing is very minimal, therefore some parallelism is feasible that way, but the pipeline is not that deep, at least ideally. Of course if people are chasing nano-seconds, it may make sense.
Very interesting. I wished to know the author. The site doesn't seem to have readily available information on the author.
After reading some other articles on the site, I have a feeling that it could be written by AI.
Smells of AI writing: "Timestamps aren't enough. Exchanges need a stronger ordering primitive." etc
yes, this is AI-assisted writing, I'm not hiding from it. but this isn't just copy-pasting. I still spend up to 15 hours working on each article.
Interesting comment, I "felt" the ai too in an undescribable way. What are some obvious tells ?
The incessant bullet lists and the conclusion titled "Conclusion" give it away. And above all, the complete lack of "voice". You can tell when a human is speaking and when a sanitised amorphous blob of averageness is speaking.