Exploring the Limits of Large Language Models as Quant Traders

(nof1.ai)

79 points | by rzk 6 hours ago ago

56 comments

kqr 5 hours ago

Super interesting! You can click the "live" link in the header to see how they performed over time. The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital.

Although I lack the maths to determine it numerically (depends on volatility etc.), it looks to me as though all six are overbetting and would be ruined in the long run. It would have been interesting to compare against a constant fraction portfolio that maintains 1/6 in each asset, as closely as possible while optimising for fees. (Or even better, Cover's universal portfolio, seeded with joint returns from the recent past.)

I couldn't resist starting to look into it. With no costs and no leverage, the hourly rebalanced portfolio just barely outperforms 4/6 coins in the period: https://i.xkqr.org/cfportfolio-vs-6.png. I suspect costs would eat up many of the benefits of rebalancing at this timescale.

This is not too surprising, given the similiarity of coin returns. The mean pairwise correlation is 0.8, the lowest is 0.68. Not particularly good for diversification returns. https://i.xkqr.org/coinscatter.png

> difficulty executing against self-authored plans as state evolves

This is indeed also what I've found trying to make LLMs play text adventures. Even when given a fair bit of help in the prompt, they lose track of the overall goal and find some niche corner to explore very patiently, but ultimately fruitlessly.

[-]

falcor84 4 hours ago

Agreed, and I'd also love to see a baseline of human performance here, both of experienced quant traders and of fresh grads who know the theory but never did this sort of trading and aren't familiar with the crypto futures market.

[-]

spaceman_2020 20 minutes ago

As someone who trades crypto semi-professionally, this was one of the toughest trading periods I've ever seen and included a massive liquidation event on 10th of October that wiped out over $20B in capital. Any trader who broke even in this period likely outperformed. I know some very, very good traders who got wiped out on leverage on 10th of October when stop losses didn't trigger and prices plummetted to 2021 levels (still no clarity why).

BTC also performed abysmally during this period with a sustained chop down from $126k to $90k.

[-]

kqr 6 minutes ago

Note that 10th of October is before the trading period in this experiment. If anything, autoregression over shorter timescales would suggest entering after 10th of October being a good idea!

fragmede 3 hours ago

> find some niche corner to explore very patiently, but ultimately fruitlessly.

What, so they're better at my hobbies than me? Someone give Claude a 3d printer!

lordnacho 3 hours ago

I was chatting to a friend in the space. This guy is both experienced in trading and LLMs, and has gone all-in on using LLMs to get his day-to-day coding done. Now he's working on the model to end all models, which is a fairly ambitious way to put it, but it throws off some interesting conversations.

You need domain knowledge to get this to work. Things like "we fed the model the market data" are actually non-obvious. There might be more than one way to pre-process the data, and what the model sees will greatly affect what actions it comes up with. You also have to think about corner cases, eg when AlphaZero was applied to StarCraft, they had to give it some restrictions on the action rate, that kind of thing. Otherwise the model gets stuck in an imaginary money fountain.

But yeah, the AI thing hasn't passed by the quant trading community. A lot of things going on with AI trading teams being hired in various shops.

[-]

Libidinalecon 13 minutes ago

You can vibe code in this space as an individual because practically everything you are going to write is already in the training data.

The big Quant hedge funds have been using machine learning for decades. I took the coursera RL in finance class years ago.

The idea you are going to beat Two Sigma at their own game with tokens is just an absurdity.

Personally, I think any individual on their own that claims they are doing anything in the algorithmic / ML high frequency space is full of shit.

I could talk like I am too and sound really impressive to someone outside the space. That is much different though than actually making money on what you claim you are doing.

It reminds me of an artist friend when I was younger. She was an artist and I quite liked her paintings. She would tell everyone she is an artist. She was also an encyclopedia when it came to anything art related. She wasn't actually selling much art though. She lived off the $10k a month allowance her rich father gave her. She wasn't even being dishonest but when you didn't know the full picture a person would just assume she was living off her art sales.

JumpCrisscross 3 hours ago

> There might be more than one way to pre-process the data

I'm honestly more hopeful about AI replacing this process than the core algorithmic component, at least directly. (AI could help write the latter. But it's immediately useful for the former.)

binsquare 44 minutes ago

Today it's clear that there are limitations to LLM's.

But I also see this incredible growth curve to LLM's improvement. 2 years ago, I wouldn't expect llm's to one shot a web application or help me debug obscure bugs and 2 years later I've been proven wrong.

I completely believe that trading is going to be saturated with ai traders in the future. And being able to predict and detect ai trading patterns is going to be an important leverage for human traders if they'll still exist

[-]

thunky 16 minutes ago

> I completely believe that trading is going to be saturated with ai traders in the future

That's probably good news for us index fund investors. We need people to believe they're going to beat the market.

callamdelaney 4 hours ago

The limits of LLM's for systematic trading were and are extremely obvious to anybody with a basic understanding of either field. You may as well be flipping a coin.

[-]

kqr 4 hours ago

I agree. Plus it's way too short a timeframe to evaluate any trading activity seriously.

But I still think the experiment is interesting because it gives us insight into how LLMs approach risk management, and what effects on that we can have with prompting.

Saline9515 2 hours ago

So what are the limits, given that you seem knowledgeable about it?

rob_c 4 hours ago

At least a coin is faster and more reliable.

falcor84 3 hours ago

20 years ago NNs were considered toys and it was "extremely obvious" to CS professors that AI can't be made to reliably distinguish between arbitrary photos of cats and dogs. But then in 2007 Microsoft released Asirra as a captcha problem [0], which prompted research, and we had an AI solving it not that long after.

Edit - additional detail: The original Asirra paper from October 2007 claimed "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" [0]. It took Philippe Golle from Palo Alto a bit under a year to get "a classifier which is 82.7% accurate in telling apart the images of cats and dogs used in Asirra" and "solve a 12-image Asirra challenge automatically with probability 10.3%" [1].

Edit 2: History is chock-full of examples of human ingenuity solving problems for very little external gain. And here we have a problem where the incentive is almost literally a money printing machine. I expect progress to be very rapid.

[0] https://www.microsoft.com/en-us/research/publication/asirra-...

[1] https://xenon.stanford.edu/~pgolle/papers/dogcat.pdf

[-]

nl 2 hours ago

The Asirra paper isn't from a ML research group. The statement: "Barring a major advance in machine vision, we expect computers will have no better than a 1/54,000 chance of solving it" is just a statement of fact - it wasn't any forms of prediction.

If you read the paper you note that they surveyed researchers about the current state of the art ("Based on a survey of machine vision literature and vision ex- perts at Microsoft Research, we believe classification accuracy of better than 60% will be difficult without a significant advance in the state of the art.") and noted what had been achieved as PASCAL 2006 ("The 2006 PASCAL Visual Object Classes Challenge [4] included a competition to identify photos as containing several classes of objects, two of which were Cat and Dog. Although cats and dogs were easily distinguishable from other classes (e.g., “bicycle”), they were frequently confused with each other.)

I was working in an adjacent field at the time. I think the general feeling was that advances in image recognition were certainly possible, but no one knew how to get above the 90% accuracy level reliably. This was in the day of hand coded (and patented!) feature extractors.

OTOH, stock market prediction via learning methods has a long history, and plenty of reasons to think that long term prediction is actually impossible. Unlike vision systems there isn't another thing that we can point to to say that "it must be possible" and in this case we are literally trying to predict the future.

Short term prediction works well in some cases in a statistical sense, but long term isn't something that new technology seems likely to solve.

[-]

falcor84 36 minutes ago

Maybe I misunderstand, but it seems that there's nothing in your comment that contradicts any aspect of mine.

Regarding image classification. As I see it, a company like Microsoft surveying researchers about the state of the art and then making a business call to recommend the use of it as a captcha is significantly more meaningful of a prediction than any single paper from an ML research group. My intent was just to demonstrate that it was widely considered to be a significant open problem, which it clearly was. That in turn led to wider interest in solving it, and it was solved soon after - much faster than expected by people I spoke to around that time.

Regarding stock market prediction, of course I'm not claiming that long term prediction is possible. All I'm saying is that I don't see a reason why quant trading could be used as a captcha - it's as pure a pattern matching task as could be, and if AIs can employ all the context and tooling used by humans, I would expect them to be at least as good as humans within a few years. So my prediction is not the end of quant trading, but rather that much of the work of quants would be overtaken by AIs.

Obviously a big part of trading at the moment is already being done by AIs, so I'm not making a particularly bold claim here. What I'm predicting (and I don't believe that anyone in the field would actually disagree) is that as tech advances, AIs will be given control of longer trading time horizons, moving from the current focus on HFT to day trading and then to longer term investment decisions. I believe that there will still be humans in the loop for many many years, but that these humans would gradually turn their focus to high level investment strategy rather than individual trades.

lambdaone 2 hours ago

What makes trading such a special case is that as you use new technology to increase the capability of your trading system, other market participants you are trading against will be doing the same; it's a never-ending arms race.

[-]

jstanley 2 hours ago

That doesn't mean it doesn't work. That means it does work!

If other market participants chose not to use something then that would show that it doesn't work.

DivingForGold 39 minutes ago

. . . "The (geometric) average result at the end seems to be that the LLMs are down 35 % from their initial capital – and they got there in just 96 model-days. That's a daily return of -0.6 %, or a yearly return of -81 %, i.e. practically wiping out the starting capital."

Proves that LLM's are nowhere near close to AGI.

[-]

sd9 22 minutes ago

The vast majority of intelligent humans cannot profitably trade on intraday timeframes

spaceman_2020 23 minutes ago

Hyperliquid now has select tokenized equities as well. Would love to see how these models perform when trading equities

I've been following these for a while and many of the trades taken by DeepSeek and Qwen were really solid

XenophileJKO 5 hours ago

I don't think betting on crypto is really playing to the strengths of the models. I think giving news feeds and setting it on some section of the S&P 500 would be a better evaluation.

aswegs8 4 hours ago

Given that LLMs can't even finish Pokemon Red, how would you expect they are able to trade futures?

[-]

falcor84 4 hours ago

(Unless you're a marketer) It makes a lot more sense to build a benchmark before the capabilities are there.

Saline9515 2 hours ago

Because trading is mainly number-based, unlike Pokemon Red?

[-]

terminalbraid an hour ago

I'll bite: What part of the game, which is encoded entirely by a finite set of numbers, takes input as numbers, provides output as numbers, and is processed by a CPU that acts in a discrete digital space, cannot be represented by numbers?

wild_pointer 4 hours ago

Hey! That wasn't easy!

agentifysh 3 hours ago

i always felt that emotions, instincts, fear, greed, courage, pain are elements of a self-aware conscious loop system that can't be replicated accurately in a digital system and that a seasoned successful traders realize and utilize that the activity is largely is a psychological one. I'm not talking about neutral plays where you can absorb market fluctuations in the short term to extract 1~2% a week but directional trades that almost all traders play (regardless of how what exotic option strategies they are employing).

also the other curious nature of the markets is its ability to destroy any persistent trading system by reverting to its core stochastic properties and its constant ebb and flow from stability to instability that crescendos into systematic instability that rewrite the rules all over again.

ive tried all sorts of ways to do this and without being a large institution and being able to absorb the noise for neutral or legal quasi insider trading via proximity, for the average joe the emotional/psychological hardness you need to survive and be in the <1% of traders is simply too much, its not unlike any other sports or arts, many dream the dream but only few get interviewed and written about.

rather i think to myself the best trade is the simplest one: buy shares or invest in a business with money or time (strongly recommend against using this unless you have no other means) and sell it at a higher price or maintain a long term DCF from a business you own as leverage/collateral to arbitrage whatever rate your central bank sets on assets in demand or will be in demand.

to me its clear where LLM fits and doesn't but ultimately it cannot, will not, must not replace your own agency.

ezekiel68 5 hours ago

You don't actually need nanosecond latency to trade effectively in futures markets but it does help to be able to evaluate and make decisions in the single-digit milliseconds range. Almost no generative model is able to perform inference at this latency threshold.

A threshold in the single-digit milliseconds range allows the rapid detection of price reversals (signaling the need to exit a position with least loss) in even the most liquid of real futures contracts (not counting rare "flash crash" events).

[-]

graemep 4 hours ago

From the article:

> The models engage in mid-to-low frequency trading (MLFT) trading, where decisions are spaced by minutes to a few hours, not microseconds. In stark contrast to high-frequency trading, MLFT gets us closer to the question we care about: can a model make good choices with a reasonable amount of time and information?

vita7777777 4 hours ago

This is true for some classes of strategies. At the same time there are strategies that can be profitable on longer timeframes. The two worlds are not mutually exclusive.

[-]

rob_c 4 hours ago

Yes, but LLM can barely cope with following the ordering of complex software tutorials linearly. Why would you reasonably expect them unprompted to understand time any better enough to trade and turn a profit?

[-]

vita7777777 an hour ago

My comment makes no such claim. I wrote about different timeframes that trading strategies operate on.

Havoc 4 hours ago

Are language models really the best choice for this?

Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as

> We also found that the models were highly sensitive to seemingly trivial prompt changes

[-]

kqr 4 hours ago

No, LLMs are not a good choice for this – as the results show! If I had to guess, they're experimenting with LLMs for publicity.

[-]

Libidinalecon 44 minutes ago

Exactly. This is a performance by a really bad method actor.

baq 4 hours ago

they're tools. treat them as tools.

since they're so general, you need to explore if and how you can use them in your domain. guessing 'they're poorly suited' is just that, guessing. in particular:

> We also found that the models were highly sensitive to seemingly trivial prompt changes

this is as much as obvious for anyone who seriously looked at deploying these, that's why there are some very successful startups in the evals space.

[-]

rob_c 4 hours ago

> guessing 'they're poorly suited' is just that, guessing

I have a really nice bridge to sell you...

This "failure" is just a grab at trying to look "cool" and "innovative" I'd bet. Anyone with a modicum of understanding of the tooling (or hell experience they've been around for a few years now, enough for people to build a feeling for this), knows that this it's not a task for a pre-trained general LLM.

[-]

baq 9 minutes ago

I think you have a different idea of what I'm saying than what I'm actually saying.

vita7777777 4 hours ago

This is very thoughtful and interesting. It's worth noting that this is just a start and in future iterations they're planning to give the LLMs much more to work with (e.g. news feeds). It's somewhat predictable that LLMs did poorly with quantitative data only (prices) but I'm very curious to see how they perform once they can read the news and Twitter sentiment.

[-]

Lapsa 4 hours ago

I would argue that sentiment classification is where LLMs perform best. folks are already using it for precisely such purpose - have even built a public index out of it

[-]

ritonlajoie 3 hours ago

what index ?

rob_c 4 hours ago

Not just can i guarantee the models are bad with numbers, unless it's a highly tuned and modified version they're too slow for this arena. Stick to using attention transformers in better model designs which have much lower latencies than pre-trained llms...

IAmGraydon an hour ago

Crazy how people continue to treat LLMs like they’re anything more than a record of past human knowledge and are then surprised when they can’t predict the future.

chronic740202 2 hours ago

Even ChatGPT knows why LLMs for quant trading would never work.

bluecalm 5 hours ago

>>LLMs are achieving technical mastery in problem-solving domains on the order of Chess and Go, solving algorithmic puzzles and math proofs competitively in contests such as the ICPC and IMO.

I don't think LLMs are anywhere close to "mastery" in chess or go. Maybe a nitpick but the point is that a NN created to be good at trading is likely to outperform LLMs at this task the same way way NNs created specifically to be good at board games vastly outperform LLMs at those games.

[-]

lukan 4 hours ago

"Maybe a nitpick but the point is that a NN created to be good at trading is likely to outperform LLMs at this task the same way way NNs created specifically to be good at board games vastly outperform LLMs at those games."

Disagree. Go and chess are games with very limited rules. Succesful trading on the other hand is not so much a arbitary numbers game, but involves analyzing events in the news happening right now. Agentic LLMs that do this and accordingly buy and sell might succeed here.

(Not what they did here, though

"For the first season, they are not given news or access to the leading “narratives” of the market.")

p1dda 4 hours ago

LLM's can do language but not much else, not poker, not trading and definitely no intelligence

Edvinyo 4 hours ago

Cool experiment, but it’s nothing more than a random walk.

reedf1 4 hours ago

you simply will lose trading directly with an llm. mapping the dislocation by estimating the percentage of llm trading bots is useful though.

jwpapi 5 hours ago

Isn’t that what Renaissance Technology does?

[-]

chronic740202 2 hours ago

> Isn’t that what Renaissance Technology does?

No.

lvl155 3 hours ago

At the end of the day it all comes down to input data. There are a lot of things you can do to collect proprietary data to give you an edge.

[-]

GaryNumanVevo 18 minutes ago

That's funny because that advice is _directly_ counter to what most HFT quants say

[-]

lvl155 14 minutes ago

Right, because they will tell you exactly how they generate alpha for all the world to see. It’s worth mentioning quant is not all HFT.