In game theory, Nash equilibrium (named after John Forbes Nash, who proposed it) is a solution concept of a game involving two or more players, in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only his own strategy unilaterally. If each player has chosen a strategy and no player can benefit by changing his or her strategy while the other players keep theirs unchanged, then the current set of strategy choices and the corresponding payoffs constitute Nash equilibrium.

Stated simply, Amy and Phil are in Nash equilibrium if Amy is making the best decision she can, taking into account Phil’s decision, and Phil is making the best decision he can, taking into account Amy’s decision. Likewise, a group of players is in Nash equilibrium if each one is making the best decision that he or she can, taking into account the decisions of the others.

However, Nash equilibrium does not necessarily mean the best payoff for all the players involved; in many cases, all the players might improve their payoffs if they could somehow agree on strategies different from the Nash equilibrium: e.g., competing businesses forming a cartel in order to increase their profits.

The prisoner’s dilemma is a fundamental problem in game theory that demonstrates why two people might not cooperate even if it is in both their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence payoffs and gave it the “prisoner’s dilemma” name (Poundstone, 1992).

A classic example of the prisoner’s dilemma (PD) is presented as follows: Two suspects are arrested by the police. The police have insufficient evidence for a conviction, and, having separated the prisoners, visit each of them to offer the same deal. If one testifies for the prosecution against the other (defects) and the other remains silent (cooperates), the defector goes free and the silent accomplice receives the full one-year sentence. If both remain silent, both prisoners are sentenced to only one month in jail for a minor charge. If each betrays the other, each receives a three-month sentence. Each prisoner must choose to betray the other or to remain silent. Each one is assured that the other would not know about the betrayal before the end of the investigation.

How should the prisoners act?

If we assume that each player cares only about minimizing his or her own time in jail, then the prisoner’s dilemma forms a non-zero-sum game in which two players may each either cooperate with or defect from (betray) the other player. In this game, as in most game theory, the only concern of each individual player (prisoner) is maximizing his or her own payoff, without any concern for the other player’s payoff. The unique equilibrium for this game is a Pareto-suboptimal solution, that is, rational choice leads the two players to both play defect, even though each player’s individual reward would be greater if they both played cooperatively.

In the classic form of this game, cooperating is strictly dominated by defecting, so that the only possible equilibrium for the game is for all players to defect. No matter what the other player does, one player will always gain a greater payoff by playing defect. Since in any situation playing defect is more beneficial than cooperating, all rational players will play defect, all things being equal.

In the iterated prisoner’s dilemma, the game is played repeatedly. Thus each player has an opportunity to punish the other player for previous non-cooperative play. If the number of steps is known by both players in advance, economic theory says that the two players should defect again and again, no matter how many times the game is played. Only when the players play an indefinite or random number of times can cooperation be an equilibrium (technically a subgame perfect equilibrium), meaning that both players defecting always remains an equilibrium and there are many other equilibrium outcomes. In this case, the incentive to defect can be overcome by the threat of punishment.

In casual usage, the label “prisoner’s dilemma” may be applied to situations not strictly matching the formal criteria of the classic or iterative games, for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it merely difficult or expensive, not necessarily impossible, to coordinate their activities to achieve cooperation.

Strategy for the classic prisoner’s dilemma

The classical prisoner’s dilemma can be summarized thus:

Prisoner B stays silent (cooperates) Prisoner B confesses (defects) Prisoner A stays silent (cooperates) Each serves 1 month Prisoner A: 1 year Prisoner B: goes free Prisoner A confesses (defects) Prisoner A: goes free Prisoner B: 1 year Each serves 3 months

Imagine you are player A. If player B decides to stay silent about committing the crime then you are better off confessing, because then you will get off free. Similarly, if player B confesses then you will be better off confessing, since then you get a sentence of 3 months rather than a sentence of 1 year. From this point of view, regardless of what player B does, as player A you are better off confessing. One says that confessing (defecting) is the dominant strategy.

As Prisoner A, you can accurately say, “No matter what Prisoner B does, I personally am better off confessing than staying silent. Therefore, for my own sake, I should confess.” However, if the other player acts similarly then you both confess and both get a worse sentence than you would have gotten by both staying silent. That is, the seemingly rational self-interested decisions lead to worse sentences—hence the seeming dilemma. In game theory, this demonstrates that in a non-zero-sum game a Nash equilibrium need not be a Pareto optimum.

Although they are not permitted to communicate, if the prisoners trust each other then they can both rationally choose to remain silent, lessening the penalty for both of them.

We can expose the skeleton of the game by stripping it of the prisoner framing device. The generalized form of the game has been used frequently in experimental economics. The following rules give a typical realization of the game.

There are two players and a banker. Each player holds a set of two cards, one printed with the word “Cooperate” (as in, with each other), the other printed with “Defect” (the standard terminology for the game). Each player puts one card face-down in front of the banker. By laying them face down, the possibility of a player knowing the other player’s selection in advance is eliminated (although revealing one’s move does not affect the dominance analysis[1]). At the end of the turn, the banker turns over both cards and gives out the payments accordingly.

Given two players, “red” and “blue”: if the red player defects and the blue player cooperates, the red player gets the Temptation to Defect payoff of 5 points while the blue player receives the Sucker’s payoff of 0 points. If both cooperate they get the Reward for Mutual Cooperation payoff of 3 points each, while if they both defect they get the Punishment for Mutual Defection payoff of 1 point. The checker board payoff matrix showing the payoffs is given below.

These point assignments are given arbitrarily for illustration. It is possible to generalize them, as follows: Canonical PD payoff matrix Cooperate Defect Cooperate R, R S, T Defect T, S P, PWhere T stands for Temptation to defect, R for Reward for mutual cooperation, P for Punishment for mutual defection and S for Sucker’s payoff. To be defined as prisoner’s dilemma, the following inequalities must hold:

T > R > P > S

This condition ensures that the equilibrium outcome is defection, but that cooperation Pareto dominates equilibrium play. In addition to the above condition, if the game is repeatedly played by two players, the following condition should be added.[2]

2 R > T + S

If that condition does not hold, then full cooperation is not necessarily Pareto optimal, as the players are collectively better off by having each player alternate between Cooperate and Defect.

These rules were established by cognitive scientist Douglas Hofstadter and form the formal canonical description of a typical game of prisoner’s dilemma.

A simple special case occurs when the advantage of defection over cooperation is independent of what the co-player does and cost of the co-player’s defection is independent of one’s own action, i.e. T+S = P+R. The iterated prisoner’s dilemma

If two players play prisoner’s dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner’s dilemma. The iterated prisoner’s dilemma game is fundamental to certain theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modelled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner’s dilemma has also been referred to as the “Peace-War game”.

If the game is played exactly N times and both players know this, then it is always game theoretically optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to punish the player. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

Unlike the standard prisoner’s dilemma, in the iterated prisoner’s dilemma the defection strategy is counterintuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoners dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.

For cooperation to emerge between game theoretic rational players, the total number of rounds N must be random, or at least unknown to the players. In this case always defect may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Nobel Prize winner Robert Aumann in his 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.