BTW before anyone gets any bright ideas about trying this in real life:
This is called "layering" and is often a criminal offense [1] and banks have effective means to detect it (as they are required to by law). In particular they are highly attuned to flows of money that seem to split up large bunches of cash into smaller amounts that the would-be money launderer thinks will be below some static threshold. The detections which catch this have been in all banks I've ever worked with (since around 2010 when I first touched on doing any anti-moneylaundering work and they were very mature and well-established by then)[2].
You are highly unlikely to get away with it unless you have already been doing it for some time or know someone who has and is willing to share their techniques.
[1] Obviously or not it's only a criminal offense if you doing it to evade detection (eg for tax fraud or to avoid paying say a court settlement or something) or are otherwise laundering the proceeds of crime - that sort of thing. If you're just doing it for your own amusement it will probably get you added to a watchlist of suspicious accounts at your bank that they will monitor extra closely and may get you investigated by various financial crime enforcement organisations. Which even if you haven't done anything wrong I imagine would be stressful and time-consuming.
[2] To give people who are not part of this world the idea, at least in the UK and US all employees of any kind of deposit-taking institution are required by regulators to have an annual anti-moneylaundering training which covers layering and how to detect it. So literally every bank employee should know what this pattern looks like and should be on the look out for it.
Layering is a step in money laundering where layers of legitimacy are added as money moves around. Structuring is breaking larger transactions into smaller transactions in order to avoid detection. Smurfing is also similar to structuring and honestly I don't understand the difference well enough to explain, but they are often used interchangeably in my experience.
This is Subset Sum and is famously NP-complete so, no, an LP doesn't work. However, the problem is only weakly NP-hard and there is a pseudo-polynomial time algorithm running in time O(M • n) where M is the sum and n is the number of accounts.
The problem is, perhaps, most of all, that it takes the same amount of space.
Ps, it's not a permutations problem, but a subsets problem. While permutations is n! subsets are "only" 2^n.
I agree that you can solve it using MILP, but I would say that commercial solvers will solve many instances of the problem pretty fast. But we know that you need even Ω(n²) time for 3SUM, so given large enough n, some instances will take a long time, regardless of solver.
This is not what math programming does. Theoretically if you derive tight enough cuts for the problem and the solution lies on a vertex of the relaxed simplex you you can get the optimal solution in polynomial time.
> Theoretically if you derive tight enough cuts for the problem and the solution lies on a vertex of the relaxed simplex you you can get the optimal solution in polynomial time.
I hate when people cloak very obvious things in jargon to make those things seem more insightful and disguise how obvious they are. What you're basically saying is if you're lucky and you win the lottery then you'll be rich.
Ie yes we all know that if you luck out then interior point will give you the optimum in polynomial time. But the whole reason why MIP is np-hard is exactly because deriving those cuts in general isn't possible (at least currently).
> This is not what math programming does
So what gurobi/cplex/etc actually do is first try a bazillion symmetry transformations/warm start heuristics/etc but then ultimately it does boil down to search.
This way of thinking describes the plateau that MIP solvers hit in the 90's.
Then in 2000's they starting implementing actual math (valid cuts in the root node mainly, but also during the search), which what gave us the exponential speed ups we enjoy today.
So you would be right 30 years ago. You are wrong in 2024.
Simplex algorithm should never work given its exponential worst case performance. Yet it solves the vast majority of LP problems, including ones with billions of variables. It is the default algorithm for every solver despite the fact that we have a polynomial interior point method in our disposal. Go figure.
Worst case performance or NP completeness is not a predictor of how fast we can solve an optimization problem.
> Simplex algorithm should never work given its exponential worst case performance.
Note that it has been proven however that if you slightly perturbate the inputs then the expected complexity becomes polynomial, meaning the inputs for which it's exponential are actually just pathological cases and it is polynomial in practice.
We started understanding of why it works so well like 30 years after its first implementation (in the 1950's)? The paper you mention is written in the 2000s while the Simplex dominated the field from at least the 70's.
Meanwhile the GP would have discarded the simplex algorithm due to its worst case performance.
Honestly, the parent is pretty accurate. No one is claiming that P = NP. However, the technology to solved mixed integer programs has improved dramatically over the last 30 years and that improvement in performance has outpaced computational speed by multiple orders of magnitude. It's the algorithms.
I just went to pull up some numbers. The following comes from a talk that Bob Bixby gave at ISMP in 2012. He is the original author of CPLEX and one of the current owners of Gurobi. Between 1991 and 2007, CPLEX achieved a 29530x speedup in their solvers. Their 1997/98 breakthrough year attributes the following speedups, Cutting planes: 33.3x, presolve: 7.7x, variable selection: 2.7x, node presolve 1.3x, heuristics: 1.1x, dive probing 1.1x. I don't have a paper reference for these numbers and I don't think he has published them, but I was at the talk.
The point is that integer programming solvers perform unreasonably well. There is theory as to why. Yes, there is still a lot of searching. However, search in-and-of-itself is not sufficient to solve the problems that we regularly solve now. Further, that increase in performance is not just heuristics.
Laugh. Probably! I gave a talk at that conference titled, "Software Abstractions for Matrix-Free PDE Optimization with Cone Constraints." I still work in the field, so you want to talk algorithms sometime, feel free to send me an email. I keep my email off of HN to limit spam, but if you search for the lead author on that presentation, it should list my website.
Yup, in time (and space) O(M • n), so with M = 1 million and let's say n = 1 million, it takes space 1 trillion. Given a byte per number, it takes a TB of memory, but is otherwise doable.
Ps, you don't want to risk a running time of form O(Mn²).
Weak NP-completeness is problems that are numeric in nature and that become polynomial time solvable if the encoding of the numbers are given in unary.
Ah, thanks. Then, since "NP-hard" and "NP-complete" are different, I would expect "weakly NP-hard" and "weakly NP-complete" to be different; but, at least according to Wikipedia, they are synonymous.
I used to walk people through the "2sum" and "3sum" variants of this for an interview question. I was not looking for anything complicated, such that it was easy enough to talk through solutions that we could play with. Had a few people that would take us down some rather complicated approaches. (Even gave offers to some that got through some complicated solutions that wouldn't work.)
I never tried doing an "optimal" solution. Thinking about the problem, I'm curious if a BDD approach wouldn't work? Will try and write one up to see. Fun times.
I said I gave offers even for folks that didn't get it working. :D
I was more using this as a way to discuss code. And "2sum" is a pretty trivial problem to solve linearly. Especially if you allow high level language use.
(To directly answer the question, there was more than CRUD. Probably still more CRUD than folks would expect.)
In the real world this would be more complicated, because there could be more mixing and withdrawals. Banks also have a lot of visibility into account transactions, at least in the case of accounts which they would know the balances for.
No, money needs to be like Monero and the state can f- right off with their panopticon financial surveillance. Sometimes it's better to be armed and tolerate or handle an occasional little bad-guy, than it is to succumb to a supermassive Goku-monster bending you over a barrell and freeing you into life-long [legal] disability with his gargantuan **** of [In]Justice™ for failure to report and pay that last $40 in income tax!
Ben Franklin on security and liberty, or something.
BTW before anyone gets any bright ideas about trying this in real life:
This is called "layering" and is often a criminal offense [1] and banks have effective means to detect it (as they are required to by law). In particular they are highly attuned to flows of money that seem to split up large bunches of cash into smaller amounts that the would-be money launderer thinks will be below some static threshold. The detections which catch this have been in all banks I've ever worked with (since around 2010 when I first touched on doing any anti-moneylaundering work and they were very mature and well-established by then)[2].
You are highly unlikely to get away with it unless you have already been doing it for some time or know someone who has and is willing to share their techniques.
[1] Obviously or not it's only a criminal offense if you doing it to evade detection (eg for tax fraud or to avoid paying say a court settlement or something) or are otherwise laundering the proceeds of crime - that sort of thing. If you're just doing it for your own amusement it will probably get you added to a watchlist of suspicious accounts at your bank that they will monitor extra closely and may get you investigated by various financial crime enforcement organisations. Which even if you haven't done anything wrong I imagine would be stressful and time-consuming.
[2] To give people who are not part of this world the idea, at least in the UK and US all employees of any kind of deposit-taking institution are required by regulators to have an annual anti-moneylaundering training which covers layering and how to detect it. So literally every bank employee should know what this pattern looks like and should be on the look out for it.
> This is called "layering"
I believe it is actually "structuring".
Layering is a step in money laundering where layers of legitimacy are added as money moves around. Structuring is breaking larger transactions into smaller transactions in order to avoid detection. Smurfing is also similar to structuring and honestly I don't understand the difference well enough to explain, but they are often used interchangeably in my experience.
edit: Here is a short educational video on the topic of money laundering: https://www.youtube.com/watch?v=RhsUHDJ0BFM&t=90s
Ooh yes. Thanks for the correction.
This is Subset Sum and is famously NP-complete so, no, an LP doesn't work. However, the problem is only weakly NP-hard and there is a pseudo-polynomial time algorithm running in time O(M • n) where M is the sum and n is the number of accounts.
The problem is, perhaps, most of all, that it takes the same amount of space.
Ps, it's not a permutations problem, but a subsets problem. While permutations is n! subsets are "only" 2^n.
This is a MIP. CBC is an open source solver with a very basic implementation of branch & bound, cuts and search heuristics.
Any modern commercial solver (Gurobi, Xpress, IBM-CPLEX, COPT) can solve problems of this size pretty fast.
I agree that you can solve it using MILP, but I would say that commercial solvers will solve many instances of the problem pretty fast. But we know that you need even Ω(n²) time for 3SUM, so given large enough n, some instances will take a long time, regardless of solver.
This is assuming exhaustive search.
This is not what math programming does. Theoretically if you derive tight enough cuts for the problem and the solution lies on a vertex of the relaxed simplex you you can get the optimal solution in polynomial time.
> Theoretically if you derive tight enough cuts for the problem and the solution lies on a vertex of the relaxed simplex you you can get the optimal solution in polynomial time.
I hate when people cloak very obvious things in jargon to make those things seem more insightful and disguise how obvious they are. What you're basically saying is if you're lucky and you win the lottery then you'll be rich.
Ie yes we all know that if you luck out then interior point will give you the optimum in polynomial time. But the whole reason why MIP is np-hard is exactly because deriving those cuts in general isn't possible (at least currently).
> This is not what math programming does
So what gurobi/cplex/etc actually do is first try a bazillion symmetry transformations/warm start heuristics/etc but then ultimately it does boil down to search.
No and No.
This way of thinking describes the plateau that MIP solvers hit in the 90's.
Then in 2000's they starting implementing actual math (valid cuts in the root node mainly, but also during the search), which what gave us the exponential speed ups we enjoy today.
So you would be right 30 years ago. You are wrong in 2024.
> No and No.
> which what gave us the exponential speed ups we enjoy today
You're just patently wrong because
1. P=?NP is still open
2. You can consult the implementation of any solver
There's really nothing more to be said about it because both of these bullets are formal proofs that you are wrong.
Simplex algorithm should never work given its exponential worst case performance. Yet it solves the vast majority of LP problems, including ones with billions of variables. It is the default algorithm for every solver despite the fact that we have a polynomial interior point method in our disposal. Go figure.
Worst case performance or NP completeness is not a predictor of how fast we can solve an optimization problem.
> Simplex algorithm should never work given its exponential worst case performance.
Note that it has been proven however that if you slightly perturbate the inputs then the expected complexity becomes polynomial, meaning the inputs for which it's exponential are actually just pathological cases and it is polynomial in practice.
Man, there was a lot of obtuse argument to wade through to get to this post. This is an interesting fact. Apparently this is the source?
https://arxiv.org/abs/cs/0111050
> should never work given its exponential worst case performance
This is again not true. We know why it works well. See the paper
"Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time"
by Spielmann and Teng.
We started understanding of why it works so well like 30 years after its first implementation (in the 1950's)? The paper you mention is written in the 2000s while the Simplex dominated the field from at least the 70's.
Meanwhile the GP would have discarded the simplex algorithm due to its worst case performance.
Honestly, the parent is pretty accurate. No one is claiming that P = NP. However, the technology to solved mixed integer programs has improved dramatically over the last 30 years and that improvement in performance has outpaced computational speed by multiple orders of magnitude. It's the algorithms.
I just went to pull up some numbers. The following comes from a talk that Bob Bixby gave at ISMP in 2012. He is the original author of CPLEX and one of the current owners of Gurobi. Between 1991 and 2007, CPLEX achieved a 29530x speedup in their solvers. Their 1997/98 breakthrough year attributes the following speedups, Cutting planes: 33.3x, presolve: 7.7x, variable selection: 2.7x, node presolve 1.3x, heuristics: 1.1x, dive probing 1.1x. I don't have a paper reference for these numbers and I don't think he has published them, but I was at the talk.
The point is that integer programming solvers perform unreasonably well. There is theory as to why. Yes, there is still a lot of searching. However, search in-and-of-itself is not sufficient to solve the problems that we regularly solve now. Further, that increase in performance is not just heuristics.
Here is the paper
https://www.emis.de/journals/DMJDMV/vol-ismp/25_bixby-robert...
FYI we’ve probably crossed paths :)
Laugh. Probably! I gave a talk at that conference titled, "Software Abstractions for Matrix-Free PDE Optimization with Cone Constraints." I still work in the field, so you want to talk algorithms sometime, feel free to send me an email. I keep my email off of HN to limit spam, but if you search for the lead author on that presentation, it should list my website.
This is also a problem for accountants, trying to do “account reconciliation” to determine which transactions contribute to a given balance.
I made a simple tool that does this client-side for an accountant friend a while ago: https://reconciliation.spcdx.com/
(Warning: runtimes do quickly scale, due to the time complexity described above)
Can't subset sum like in TFA be solved by dynamic programming using a modified 0-1 knapsack? (modified to keep track of selected items)
Yup, in time (and space) O(M • n), so with M = 1 million and let's say n = 1 million, it takes space 1 trillion. Given a byte per number, it takes a TB of memory, but is otherwise doable.
Ps, you don't want to risk a running time of form O(Mn²).
> weakly NP-hard
What does this mean?
Weak NP-completeness is problems that are numeric in nature and that become polynomial time solvable if the encoding of the numbers are given in unary.
https://en.wikipedia.org/wiki/Weak_NP-completeness
Ah, thanks. Then, since "NP-hard" and "NP-complete" are different, I would expect "weakly NP-hard" and "weakly NP-complete" to be different; but, at least according to Wikipedia, they are synonymous.
I used to walk people through the "2sum" and "3sum" variants of this for an interview question. I was not looking for anything complicated, such that it was easy enough to talk through solutions that we could play with. Had a few people that would take us down some rather complicated approaches. (Even gave offers to some that got through some complicated solutions that wouldn't work.)
I never tried doing an "optimal" solution. Thinking about the problem, I'm curious if a BDD approach wouldn't work? Will try and write one up to see. Fun times.
Please tell me the position entailed more than just CRUD application work
I said I gave offers even for folks that didn't get it working. :D
I was more using this as a way to discuss code. And "2sum" is a pretty trivial problem to solve linearly. Especially if you allow high level language use.
(To directly answer the question, there was more than CRUD. Probably still more CRUD than folks would expect.)
In the real world this would be more complicated, because there could be more mixing and withdrawals. Banks also have a lot of visibility into account transactions, at least in the case of accounts which they would know the balances for.
No, money needs to be like Monero and the state can f- right off with their panopticon financial surveillance. Sometimes it's better to be armed and tolerate or handle an occasional little bad-guy, than it is to succumb to a supermassive Goku-monster bending you over a barrell and freeing you into life-long [legal] disability with his gargantuan **** of [In]Justice™ for failure to report and pay that last $40 in income tax!
Ben Franklin on security and liberty, or something.
amazing