Wednesday, October 18, 2017

Games with incomplete information

There are traditional machine learning tasks, where signal is hidden in the data, and once it is found, you're done.

There are games where the situation is well defined, known to everyone, opponents take turns and the outcome is set in stone for each sequence of turns. It's just hard to look through all the possibilities to confirm whether this or that path has a winning ending for you. So you need to have a general idea of how the late stages of the game tend to look like, as well as the ability to see the possibility more steps ahead than your opponent. So the information about opponent becomes irrelevant, instead you just focus on predicting further. Then you are can become good at playing against versions of yourself who are also trying to do so, and that pretty much guarantees you will be good against anyone else.

Now games with incomplete information seem to require you to have some understanding of your opponent. A practical example is poker. Any Nash equilibrium examples are also such. The distinction is kind of blurry though. In chess, you can also try to win by understanding your opponent, it's just not an effective strategy in that context. The distinguishing feature of poker, besides having a lot of freedom with bets and multiple players, is that each player is dealt a hand that is not revealed until the end (if at all). So if in chess there was a clear deterministic outcome that you can calculate depending on actions the players take, in poker at the end the outcome also depends on a hidden variable known to another player but not you. If you just assign a uniform prior to that variable, you will win against completely stupid players, but you will lose pretty quickly to anyone with experience because your actions reveal your hidden variable while you don't even try to learn anything about theirs. In the competition, it becomes very meta: the player who can predict what others think he thinks others think... wins.

However, hidden variable do not even have to be random. If we ask the players to hide the action they choose until both of them chose something, and then reveal - that's exactly the same. Let's start with a game with no hidden info. Consider a game with two players, where they are assigned score based on their choice and the opponents last choice according to the matrix
a b
c d
They keep making choices in turn. One of the players chooses corresponding to max(a,b) and max(c,d), another to max(a,c), max(b,d). Now consider the matrix like
0 -1
-1 1
It has two equilibria - 0 and 1 - where the next choice is always the same for both players. However, swapping between the two incurs a penalty. Clearly, even without communicating players will switch to the second answer. Consider
-1 1
1 -1
This matrix encourages them to answer differently from each other. Consider
-1 1
-1 -1
This one is weird - the only way for any of the players to win is when another answers the first answer. But then the other doesn't get any benefit from doing so, unless he indicates that he will stop giving freebies and wait for the other player to answer the first answer. If they understand each other quickly, they can form a sequence like 0110110110110110.. which will give them only -n/3 value. If one of them submits, he gets -n and the other gets +n.
A matrix like
00
10
is equivalent to the above.
11
00
will result in the game where outcome doesn't depend on your choice locally, but you can be nice to your opponent and let him win. He can then do the same to you


In principle their matrices do not have to be the same.
Also the scoring system may be different: they may want to optimize the difference between their scores, not the individual scores. In that case, even the games above become more complicated! A first step of complication is if the differences are counted locally: that just introduces dependance on your previous step. But then the choice is not so easy anymore!

How about another simplification: yes there are turns, but the first player token shifts once the players made their moves. The score is only counted within a pair of moves. Again there are matrices, in terms of two possible answers of first player and second player:
10
01
everyone just gets zero score. Nothing to see here. Wait what? How is the first player scored? Is only the second player scored? I guess.
01
10,
11
10,
same, but
00
01
is special. Actually only  max in a row matters. In this simple game, you can give something to your opponent in hopes that he will give something to you. And the only way to win is if he gives back more than you gave him by the time the game ends.

The game is actually very simple. You have two choices, 0 and 1. If you press 1, your opponent's score increases by 1. That's all. You take turns pressing the buttons. The game lasts n rounds. Your goal is to get a bigger score than your opponent.

The problem with this game is the presence of a simple solution where the game gets "stuck".

Let's consider different scoring mechanisms. Now the first player gets the score. The matrix is again
00
01
so the second player may or may not give him the score if he asks for it. This game is a bit more interesting: the two players ask each other, and then it reduces to the first one.
Now if both players get the score, there are more choices, the score of first and second correspondingly:
00
00,
01
01
second can just increase his score by his choice.
10
10,
01
01
second choses whether to increase himself or the opponent, clearly chooses himself.
10
10,
11
00
 first one chooses to increase second, second chooses to increase first - reduces to the previous game.
10
10,
10
01
first one will always choose the one that leads to his own increase.
Strangely, nothing interesting here.

All of these games can in principle be solved by recursion: the very last step is obvious, regardless of the past configuration. Thus the step before that can be decided by using the knowledge of what your opponent will do on last step for any configuration. Et cetera, et cetera. The only time where interaction with the opponent becomes more personal is when you're "begging" for scores like in the examples above.

Now when the players reveal their answers simultaneously, things get a bit more tricky, and cooperation becomes more important especially if the game is repeated several times. I'd like to give an example of a 3 player game that scores +1 to players who voted together and -1 to the player who was a minority vote, and -1 to all if they all agree.
Interesting thing about this game is that your score does not depend on your actions directly at all, only on whether the two other players agree or disagree. But if the game is repeated several times, they can infer your strategy and your score will start depending on your answers.

Even in the taking turns in sequence case, when each one gets scored according to the previous two, the game is essentially a game of begging. Half of the times the players swap their vote for something completely random, so they can try to give things to other players in hopes of getting something in return.

There's no beating strategy to such game, you can't even win against random opponents. But in the very special cases, if your opponent is consistent, you can hope to win something. Now imagine an algorithm trained to learn this game, to defeat other, previous generation algorithms. What is it gonna be like?

Imagine 3 STEM PhD's paid hundreds of thousands of dollars to compete in this game, with the one losing being fired. Of course they will make really sophisticated algorithms to outperform each other. Or maybe they will immediately form a coalition and kick the odd one out.

So the current state of machine trading is somewhat reminiscent of this game. I'd like to make an estimate of how long it takes to learn about your opponent. Every turn of the game you get 1 bit of information (whether you lose or win). You don't even know how others vote in the hidden variable case, or you do see the pattern of votes (thus know your opponents and get 2 bits every turn). So if you assume the algorithm of size n for your opponents, it will take O(n)*assumed lookback to estimate what algorithm each one of them have, and come up with your own. But that algorithm by itself is much bigger, so the next generation of opponents will have to estimate much bigger parameters space. However it's highly non-generic, and also has a transient where it clearly estimates the opponents behavior. The point is, at fairly low step of "meta" the data sizes become intractable to estimate anything.


Scientific facts about investment

To the first approximation, investment is a roulette. The house (your broker) always wins thanks to fees, while you have 50/50 chances. This was shown pretty clearly in first part of the movie Wolf of Wall St, for instance. If you exchange dollars for euros today, and then exchange them back some other day when you feel like it, then ignoring the fees your expectation value of the difference in the dollar amounts is zero. Even the distribution is fairly uniform. Here by expectation value we mean our knowledge: we know about this number as much as we know about a random variable with zero mean and a normal distribution. In the same way, if you buy Apple stock today, and then sell it some day next year when you feel like it, there's no knowledge in science whether it's gonna be up or down then.

To the second approximation, S&P 500 appears to grow. So you have a small positive expectation value in your scientific knowledge for the purchase of Apple stock, that is essentially drowned in dispersion (random fluctuations). Buying S&P 500 index reduces that dispersion by a bit, so you expect to see that small positive expectation value after 3-5 years. The distribution is also skewed - it grows faster typically, but there are crises where it falls dramatically. Crises happen every 10 years or so. That is what has been observed in US for the last 100 years, and in other countries seemingly independently for some time. So a consensus seems to be that it will continue this way throughout the history of humanity. Here people confuse the more tangible concepts of economic growth, consumption, GDP, quality of life, population growth with the artificial concept of market price of S&P 500 (or any other index) stocks. They are actually not related by anything hard-wired in the system, or any inherent property of human psyche. The relationship between them is more like the relationship between the gold stored in fort Knox and the number of dollar bills in circulation. It used to be kept due to conservatism of the people in charge, but it can be completely abandoned whenever it becomes convenient. The value of dollar bills is actually completely independent of amount of gold in practice.

In the same way the price of stocks can in principle become independent from the wellbeing of the companies they represent. We are already in the age where computer algorithms are trading with each other, and the only thing they care about is beating each other, not the concept of value. There are regulators that enforce certain constraints on those algorithms, but the main constraint is against volatility, not against the deconstruction of meaning. So a more pessimistic picture of the future is that traditional investors will slowly become disappointed in value investing, and the prices of stocks will level out in accordance of regulations. The computer algorithms will have a lot of fun trading with each other, but stocks will become just another meaningless electronic currency that fluctuates within prescribed bounds, but is somewhat useless even for the leadership of the company that issued it. This transition is likely to happen around these years, so we might see the last impressive growth in the stock prices around the world. It's probably a good idea to ride this wave, it might be the last one.

To motivate a little bit why stock is decoupled from value, let me just list obvious things:
The company issues stocks, but does not have any obligation to pay dividends on it, or buy or sell it in any way. The company may have ups and downs, the profits may be redistributed as bonuses to the leadership or invested in a company of CEO's wife, either way the stockholders see little of  it. Essentially once the stock is issued, it's just a piece of paper. The voting rights don't matter because someone holds 51% anyways. You can't interact with the company in any significant way using the stock you own. For all intents and purposes, what's happening is just a crowd of people that have in their mind some idea of what price this or that piece of paper has. It's all in their heads. There's no fixed meaning of that piece of paper. If the crowd suddenly decides as a whole to do something else with it, there's no stopping force. Of course real crowd has inertia - there are always conservative people who stick to old ways. But the majority is pretty flexible. And what the majority is doing is going on a wild goose chase of pattern-seeking, sometimes glorified as machine learning or advanced statistics. Of course, those approaches provide good tools to tell whether the pattern is really there, or you're just imagining it. However the malpractice of using these tools in finance introduces several layers of selection biases, after which the original results of those tools become meaningless. First the author of the algorithm or strategy discards a bunch of strategies that didn't work. Then the hedge fund fires the workers whose strategies didn't perform. Then the hedge funds that didn't perform disappear and new ones are created. Finally the investors look at hedge funds, choose the ones that made the best impression using what they call "due diligence". Why is this multi-level scheme of choosing which strategy will the money follow bad? For instance, if at least one of the links in the chain is untrustworthy, the whole thing falls apart, and the huge chunk of money in possession of that given investor is invested according to a terrible, meaningless, essentially random strategy. Moreover, due to constant rotation of algorithms, quants, hedge funds, and investors themselves, there's no memory in the system.  If one tries to ask: how well do hedge funds perform in general? Or: how well do our employees perform in general? One gets an essentially random number that can happen to be positive, which is when it will be used in an advertisement brochure, but it also can happen to be negative, which is when it is not talked about. Selection bias is when we mention a few success stories like Steve Jobs, but never talk about 99% of failure stories. In the same way, the employee may not be 100% honest about his selection of the algorithm and not 100% blind in his crossvalidation. The company's choice of who to fire and whose algorithm to invest money into is usually rushed, and the good statistics are exaggerated (p-value hacking). The successful hedge funds get media and investor attention, while the failed ones disappear quietly, and nobody really counts how many of them were there. Successful investors have a lot of money and a personally confirmed belief that the whole system works so the majority of money in the future is distributed by people whose personal experience was not representative of the system as a whole. Of course locally this industry looks legit, and every quant believe himself to be smarter than the other guy, and his statistical tools to be flawless. So unless you compare his experiences to the experiences of another quant who also believed he was smart but lost, everything seems to be working. Look at people who were fired from hedge funds. Are they really objectively less smart than the ones that work there today? Or were they just unlucky? When there's a hedge fund with flawless 30-year record that you see, how many other hedge funds were there that lost money and lost investors so you don't see them?

For the technical and fundamental traders of the 90'ies, the numbers really added up to zero as they should, ignoring the "making money from air" effect (overall S&P 500 growth, and people who grew thanks to market manipulation, fees, bubbles aka pyramid schemes). That's not bad by itself, in every competition there are winners and losers. It's even not surprising to have zero-sum game where the total wealth remains the same. Still you would expect that the person who professionally trades, as in he convinced somebody to pay him fees to trade their money, will perform better than the random number generator. This was not the case - the industry was so deeply rotten that no selection based on the promo materials and no due diligence were able to correlate with the positive performance of the professional investors. They all were in it for the fees after all. Now there's a myth that the machine trading hedge funds are more honest about their performance reports. And some of them definitely are. But the scientific hypotheses is that the algorithmic trading is equally rotten so that from the outside point of view there's no way to tell apart the fakes from the real ones. They themselves don't know if they are fake sometimes, because quants who have doubts about it just get fired, or not get hired in the first place.

Anyways, even if there's a positive returns promise that we can believe from some of the algorithmic trading, it is not accessible to ordinary investors. You need to have a lot of money to even begin talking to some of the hedge funds. There are platforms which allow you to use openly available algorithms and reproduce whatever the hedge funds are doing, at your own risk.  But no statistically significant good strategies really exist. The trick with statistical significance is that for strong enough noise, nothing is statistically significant. Moreover, the more complicated your strategy is, the smaller the noise that will completely ruin it's significance. Suppose your strategy has n parameters that can be 0 or 1. So you have 2^n strategies. The base probability is 1/2^n for each of those to be the best one. Suppose the strategies true expectation value of working at all is e<<1, and dispersion 1. Then for it to be picked, you need a sample of poly(n)/e points. But when you write a strategy, you make a lot of choices, so n is pretty big actually. For any interesting algorithm, the sample size needed is enormous - much bigger than the trading data available. And all the simple algorithms were already been found and washed out. So the robots have to choose algorithms in the pool where there's no statistically significant difference between them. This is just for illustrative purposes and is oversimplified.

I don't think the idea of robots fighting each other in a game of insufficient information is that far-fetched. It's like horse-races. You need to spend a lot of time and money on your horse for it to not lose immediately, but your competitors are doing the same, and in the end there's no way to predict who's gonna win. In the same way, an amateur-made algorithm will immediately lose on the fees, but even the professionally made algorithms are essentially giving back random performance. Of course that performance is properly smooth, so they create an illusion of doing something intelligent (or otherwise they would have been discarded at the selection process), but they can smoothly loose money just as well as they can smoothly win.