×

## I. The basics and dominant strategies

Game theory is a mathematical discipline which models interactions among individuals and examines the strategies they use in these interactions. We quantify the satisfaction of an individual with the outcome of an interaction by the utility, a number that depends on the outcome. By convention a player is more satisfied if they receive a higher utility, i.e. they get 4 instead of 2 from a game. In the final challenge, the utility will be food.

A strategy is a complete set of actions which correspond to every possible situation. For example, imagine a "snowball fight" game where you and your opponent each have a snowball and you decide whether to throw the snowball or duck. If you play simultaneously, or decide at the same time, then your complete strategy is simply "throw" or "duck"; your opponent's decision doesn't impact yours. Strategies get more complicated if the game is sequential (one person decides before the other). If your opponent decides first, then the set of your possible strategies is:

• throw if opponent throws, duck if they duck.
• throw if opponent ducks, duck if they throw.
• throw whatever opponent does.
• duck whatever opponent does.

A set of actions is a strategy if and only if you have a response to every possible thing that your opponent can play.

The goal of a game is usually to end up with the highest utility possible. In some games, the goal may be to reach a certain level of utility first, or to end up with the higher utility than everyone else after the game has ended. Taking into account the goal of your game, you should:

• analyze the possible actions of your opponents,
• come up with your best response to every action your opponents might have taken.

This set of "best responses" is your strategy.The central assumption of game theory is rationality. In short, rationality means that neither you nor your opponent will play a strategy that is not good for you/them. Note that in the final challenge, your fellow competitors may not technically act rationally if they have misanalyzed the game.

The standard representation of a game is through a payoff matrix in which vertical and horizontal rows represent strategies of two players, and their utilities are given by the intersection of the rows, as shown in Game One below:

Game One

The payoff matrix is interpreted in the following way: if a row player (player 1) plays $$l$$, and column player (player 2) plays $$R$$, the intersection of the row $$l$$ and column $$R$$ give $$(3,4)$$. That would mean that the row player gets 3 and the column player gets 4 (by convention, first number in a matrix entry corresponds to the row player).

In Game One, player 1 can play $$l$$ or $$r$$, while player 2 can play $$L$$ or $$R$$. Under the assumption of rationality, that each player will try to maximize their utility, one can solve this game. Solving a game means finding the best strategies for each player. Whatever player 2 plays, player 1 will be better off if he plays $$l$$ since the utility of playing $$l$$ is higher than the utility of playing $$r$$ in both cases ($$4 > 2$$ and $$3 > 2$$). Therefore, his dominant strategy would be $$l$$. A dominant strategy always gives at least as good payoff as any other strategy.

Just by looking at the game, we see that Player 2 does not have a dominant strategy since it's better for him to play L if Player 1 plays $$l$$, and $$R$$ if Player 1 plays $$r$$. However, Player 2 knows what the game looks like, so if we assume he's rational, he will figure out that Player 1 will always play $$l$$. Therefore, Player 2 will always play $$L$$. The winning strategies for this game are $$(l, L)$$.

Probably the most famous example of a game is the Prisoner's Dilemma. In this game, there are two prisoners who are being questioned by the police. They can either cooperate (play $$C$$) with each other and stay silent, or defect and confess (play $$D$$). If one of them defects and the other doesn't, the one that defects goes free while the one who cooperates goes to jail for a very long time. If both cooperate, they'll go to prison only for a short time since there is not enough evidence. If both defect, both will go to prison for a medium amount of time. A sample payoff matrix for this game is therefore

Prisoner's dilemma

where higher payoffs correspond to shorter jail times. The intuitive solution to this game would be $$(c,C)$$ since that gives a high payoff for both players. However, if we inspect the game, we can conclude that whatever one prisoner decides, it is better for the other prisoner to defect and confess. Therefore, the equilibrium of this game is actually $$(d, D)$$.

## II. Nash equilibrium

Although dominant strategy equilibrium is a very strong solution, many realistic games cannot be solved like this due to their payoffs, and there is no clear "winning strategy". In that case, a player can assume the actions of the opponents and find the best response to them, but there is obviously no guarantee that the opponent will play the estimated strategy. A weaker solution concept very often used to solve a game is that of Nash equilibrium. Nash equilibrium assumes that you can guess the other player's strategy and that your strategy is a best response to it; therefore, both players play strategies that are best response to each other. Nash equilibrium is different from dominant strategy equilibrium in that a dominant strategy is a best response to all the strategies, whereas Nash equilibrium is a best response only to a particular strategy. Therefore, every dominant strategy equilibrium is also a Nash equilibrium, but the reverse is not true.

## III. Games with mixed strategies

Not all games have a Nash equilibrium in the form of pure strategies, i.e. where one plays a certain move all the time. For these games the players can also play mixed strategies, in which they play a certain move with some probability. When they play mixed strategies, the players want to maximize their expected payoff, which is a weighted sum of possible payoffs. The key to solving games with mixed strategies is to realize that in order to mix the two moves randomly, a player has to be indifferent between them.

Consider the following example of the "Asymmetric Prisoner's Dilemma".

### Asymmetric Prisoner's Dilemma

Asymmetric Prisoner's Dilemma

Assume Player 1 plays $$c$$ with probability $$\alpha$$ and $$d$$ with probability $$1-\alpha$$. Player 2 plays $$C$$ with probability $$\beta$$ and $$D$$ with $$1-\beta$$. A strategy profile for this game would be a set of probabilities $$\alpha$$ and $$\beta$$.

Since Player 1 plays both $$c$$ and $$d$$ with some probability, he has to be indifferent between them. In other words, the expected payoffs from the two moves have to be the same. Therefore:

$$5\beta + 0(1-\beta) = 6\beta -2(1 - \beta)$$

from which we find $$\beta = \frac{2}{3}$$.

Similarly, Player 2 has to be indifferent between $$C$$ and $$D$$:

$$5\alpha + 0(1- \alpha) = 8 \alpha -1(1 - \alpha)$$

from which we find $$\alpha = \frac{1}{4}$$. Therefore, the Nash equilibrium strategy for Player 1 is to play $$c$$ with probability $$\alpha = \frac{1}{4}$$ and $$d$$ with probability $$\frac{3}{4}$$. Similarly, Player 2 would play $$C$$ with $$p_c = \frac{2}{3}$$ and $$D$$ with $$p_d = \frac{1}{3}$$.

## IV. Public Goods

In many realistic situations there are benefits to cooperation that accrue to all players, but only if a certain number of players cooperate. These benefits are called public goods. A typical public good is a community playground built by donations. The playground can be built only if enough people help, but the playground benefits everyone equally. Public goods tend to increase the chance that people will cooperate.

## V. Repeated games and reputation

We already saw that when you play Prisoner's Dilemma once, the dominant strategy is to defect. If you play Prisoner's Dilemma finitely many times, defecting every time is still the dominant strategy according to backward induction. It's best to defect in the last game since it is the same as if you were playing one game, but then recursively you will defect in the previous game etc. However, if you play repeated Prisoner's Dilemma, or any other game, but you do not know when it will end (so it is in essence infinitely repeated) you cannot apply backward induction, and many new optimal strategies emerge. For example, you could cooperate in one stage and signal to your partner that you will like to cooperate in the future, in which case he may cooperate as well. Since $$(c,C)$$ would yield higher payoff than $$(d,D)$$, you would on average be better off in the long run. There are many eq.uilibrium strategies that would have both cooperation and defection as optimal choices depending on the history of opponent's moves.

In a more realistic scenario, one player is not stuck playing with only one other player forever. Instead, both players are part of a larger population and they play off and on with many players. In such a game, one cannot necessarily signal a strategy to the other player as you may only play them once. This scenario is typical among companies in a market, a company may deal with another company just once or only occasionally. When players are part of a population, however, they can have a reputation, which tells how they on average play. If a company does good work and treats other companies fairly, then even though they may not make as much money on any individual deal, their reputation increases and they are likely to get more future work. In contrast, those companies out for a quick buck may get a large payoff once, but that strategy is unlikely to be sustainable as their reputation sinks and other companies either do not do business with them or use sub-optimal strategies (in essence constantly defect).

×