Wednesday, October 18, 2017

Games with incomplete information

There are traditional machine learning tasks, where signal is hidden in the data, and once it is found, you're done.

There are games where the situation is well defined, known to everyone, opponents take turns and the outcome is set in stone for each sequence of turns. It's just hard to look through all the possibilities to confirm whether this or that path has a winning ending for you. So you need to have a general idea of how the late stages of the game tend to look like, as well as the ability to see the possibility more steps ahead than your opponent. So the information about opponent becomes irrelevant, instead you just focus on predicting further. Then you are can become good at playing against versions of yourself who are also trying to do so, and that pretty much guarantees you will be good against anyone else.

Now games with incomplete information seem to require you to have some understanding of your opponent. A practical example is poker. Any Nash equilibrium examples are also such. The distinction is kind of blurry though. In chess, you can also try to win by understanding your opponent, it's just not an effective strategy in that context. The distinguishing feature of poker, besides having a lot of freedom with bets and multiple players, is that each player is dealt a hand that is not revealed until the end (if at all). So if in chess there was a clear deterministic outcome that you can calculate depending on actions the players take, in poker at the end the outcome also depends on a hidden variable known to another player but not you. If you just assign a uniform prior to that variable, you will win against completely stupid players, but you will lose pretty quickly to anyone with experience because your actions reveal your hidden variable while you don't even try to learn anything about theirs. In the competition, it becomes very meta: the player who can predict what others think he thinks others think... wins.

However, hidden variable do not even have to be random. If we ask the players to hide the action they choose until both of them chose something, and then reveal - that's exactly the same. Let's start with a game with no hidden info. Consider a game with two players, where they are assigned score based on their choice and the opponents last choice according to the matrix
a b
c d
They keep making choices in turn. One of the players chooses corresponding to max(a,b) and max(c,d), another to max(a,c), max(b,d). Now consider the matrix like
0 -1
-1 1
It has two equilibria - 0 and 1 - where the next choice is always the same for both players. However, swapping between the two incurs a penalty. Clearly, even without communicating players will switch to the second answer. Consider
-1 1
1 -1
This matrix encourages them to answer differently from each other. Consider
-1 1
-1 -1
This one is weird - the only way for any of the players to win is when another answers the first answer. But then the other doesn't get any benefit from doing so, unless he indicates that he will stop giving freebies and wait for the other player to answer the first answer. If they understand each other quickly, they can form a sequence like 0110110110110110.. which will give them only -n/3 value. If one of them submits, he gets -n and the other gets +n.
A matrix like
00
10
is equivalent to the above.
11
00
will result in the game where outcome doesn't depend on your choice locally, but you can be nice to your opponent and let him win. He can then do the same to you


In principle their matrices do not have to be the same.
Also the scoring system may be different: they may want to optimize the difference between their scores, not the individual scores. In that case, even the games above become more complicated! A first step of complication is if the differences are counted locally: that just introduces dependance on your previous step. But then the choice is not so easy anymore!

How about another simplification: yes there are turns, but the first player token shifts once the players made their moves. The score is only counted within a pair of moves. Again there are matrices, in terms of two possible answers of first player and second player:
10
01
everyone just gets zero score. Nothing to see here. Wait what? How is the first player scored? Is only the second player scored? I guess.
01
10,
11
10,
same, but
00
01
is special. Actually only  max in a row matters. In this simple game, you can give something to your opponent in hopes that he will give something to you. And the only way to win is if he gives back more than you gave him by the time the game ends.

The game is actually very simple. You have two choices, 0 and 1. If you press 1, your opponent's score increases by 1. That's all. You take turns pressing the buttons. The game lasts n rounds. Your goal is to get a bigger score than your opponent.

The problem with this game is the presence of a simple solution where the game gets "stuck".

Let's consider different scoring mechanisms. Now the first player gets the score. The matrix is again
00
01
so the second player may or may not give him the score if he asks for it. This game is a bit more interesting: the two players ask each other, and then it reduces to the first one.
Now if both players get the score, there are more choices, the score of first and second correspondingly:
00
00,
01
01
second can just increase his score by his choice.
10
10,
01
01
second choses whether to increase himself or the opponent, clearly chooses himself.
10
10,
11
00
 first one chooses to increase second, second chooses to increase first - reduces to the previous game.
10
10,
10
01
first one will always choose the one that leads to his own increase.
Strangely, nothing interesting here.

All of these games can in principle be solved by recursion: the very last step is obvious, regardless of the past configuration. Thus the step before that can be decided by using the knowledge of what your opponent will do on last step for any configuration. Et cetera, et cetera. The only time where interaction with the opponent becomes more personal is when you're "begging" for scores like in the examples above.

Now when the players reveal their answers simultaneously, things get a bit more tricky, and cooperation becomes more important especially if the game is repeated several times. I'd like to give an example of a 3 player game that scores +1 to players who voted together and -1 to the player who was a minority vote, and -1 to all if they all agree.
Interesting thing about this game is that your score does not depend on your actions directly at all, only on whether the two other players agree or disagree. But if the game is repeated several times, they can infer your strategy and your score will start depending on your answers.

Even in the taking turns in sequence case, when each one gets scored according to the previous two, the game is essentially a game of begging. Half of the times the players swap their vote for something completely random, so they can try to give things to other players in hopes of getting something in return.

There's no beating strategy to such game, you can't even win against random opponents. But in the very special cases, if your opponent is consistent, you can hope to win something. Now imagine an algorithm trained to learn this game, to defeat other, previous generation algorithms. What is it gonna be like?

Imagine 3 STEM PhD's paid hundreds of thousands of dollars to compete in this game, with the one losing being fired. Of course they will make really sophisticated algorithms to outperform each other. Or maybe they will immediately form a coalition and kick the odd one out.

So the current state of machine trading is somewhat reminiscent of this game. I'd like to make an estimate of how long it takes to learn about your opponent. Every turn of the game you get 1 bit of information (whether you lose or win). You don't even know how others vote in the hidden variable case, or you do see the pattern of votes (thus know your opponents and get 2 bits every turn). So if you assume the algorithm of size n for your opponents, it will take O(n)*assumed lookback to estimate what algorithm each one of them have, and come up with your own. But that algorithm by itself is much bigger, so the next generation of opponents will have to estimate much bigger parameters space. However it's highly non-generic, and also has a transient where it clearly estimates the opponents behavior. The point is, at fairly low step of "meta" the data sizes become intractable to estimate anything.


No comments:

Post a Comment