零和博弈

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
打豆豆讨论 | 贡献2020年12月5日 (六) 18:26的版本
跳到导航 跳到搜索

此词条暂由水流心不竞初译,翻译字数共1421,带来阅读不便,请见谅。

此词条暂由Miyasaki审校

 --Miyasaki讨论)此词条存在重复部分,有些重复段落不能确切定位,暂搁置。

模板:Distinguish

模板:Other uses

博弈论和经济理论中, 零和博弈Zero-sum game是对某种情形的一种数学描述,在这种情形中每个参与者的效用增减与其他参与者的效用的增减互相平衡。如果将参与者的总收益加起来,再减去总损失,则它们之和为零。因此,如果公认蛋糕每一部分都具有同等价值,那么切蛋糕就是一个零和游戏,切一块蛋糕会减少给其他人的蛋糕量,同时也会增加给那个接受者的蛋糕量。

相比之下,非零和Non-zero-sum描述了另一种情形,在这种情形中,相互作用的各方的总计收益和损失可能小于或大于零。零和博弈也称为严格竞争博弈,而非零和博弈可以是竞争博弈,也可以是非竞争博弈。零和博弈通常是用极大极小定理来解决的,这个定理与线性规划二元性[1]密切相关。

许多人对将情况视为零和有认知偏差,称为零和偏差

零和博弈是常数和博弈的一个具体例子,其中每个结果的和总是为零。这种游戏是分配性的,而不是综合性的; 良好的谈判无法扩大这块蛋糕。

定义

零和属性(如果一个获得,另一个失败)意味着零和情况的任何结果都是帕累托最优。一般来说,所有策略都是帕累托最优的博弈称为冲突博弈.[2]

零和博弈是恒定和博弈的特定示例,其中每个结果的总和始终为零。这种游戏是分布式的,而不是集成的;不能通过良好的谈判来扩大派。

参与者可以共同获益或共同受苦的情况称为 非零和。因此,如果一个香蕉过剩的国家与另一个国家进行交易以换取其过剩的苹果,这两个国家都从交易中受益,那么这个国家就处于一种非零和情况。其他非零和博弈是这样一种博弈,在这种博弈中,参与者的得与失之和有时大于或小于开始时的水平。

在零和博弈中,帕累托最优收益的概念引出了一个广义的相对自私的理性标准,即惩罚对手的标准,在这个标准中,双方总是以对自己较有利的代价来寻求最小化对手的收益,而不是偏好多于少。惩罚对手标准可以同时用在零和博弈(例如战争游戏,国际象棋)和非零和博弈(例如:集合选择游戏)。

模板:Payoff matrix


对于双人有限零和博弈来说, 纳什均衡点Nash equilibrium、极大极小和极大的不同对策理论解概念都给出了相同的解。如果允许参与者采用混合策略,博弈中总是存在平衡。


{ | class = “ wikable” style = “ float: right; margin-left: 1em; ” Situations where participants can all gain or suffer together are referred to as non-zero-sum. Thus, a country with an excess of bananas trading with another country for their excess of apples, where both benefit from the transaction, is in a non-zero-sum situation. Other non-zero-sum games are games in which the sum of gains and losses by the players are sometimes more or less than what they began with. For two-player finite zero-sum games, the different game theoreticsolution concepts of Nash equilibrium, minimax, and maximin all give the same solution. If the players are allowed to play a mixed strategy, the game always has an equilibrium. 对于两人有限零和对策,Nash均衡, 最小最大 最大最小的不同的博弈论解概念都给出了相同的解。如果允许玩家玩一个混合策略,游戏总是有一个平衡点。
A zero-sum game
零和博弈


}} }}


零和博弈中帕累托最优收益的思想产生了一种广义的相对自私理性标准,即惩罚对手标准,在这种标准中,双方总是以对自己有利的成本来寻求最小化对手的回报,而不是偏好多而少。惩罚对手标准可用于零和博弈(如战争博弈、国际象棋)和非零和博弈(如池选博弈)[3]



Solution 方案



white}}

会发生什么

Example 举例

white}}

会发生什么


white}}

会发生什么

A zero-sum game“一个零和博弈”
模板:Diagonal split header white}}

会发生什么

模板:Blue white}}

会发生什么

模板:Blue white}}

会发生什么

模板:Blue

|-

! 模板:Red

A game's payoff matrix is a convenient representation. Consider for example the two-player zero-sum game pictured at right or above.

一场博弈的收益矩阵是一种方便的表示形式。让我们以图中右上方的两人零和博弈为例来考虑一下。

| 模板:Diagonal split header

| 模板:Diagonal split header

The order of play proceeds as follows: The first player (red) chooses in secret one of the two actions 1 or 2; the second player (blue), unaware of the first player's choice, chooses in secret one of the three actions A, B or C. Then, the choices are revealed and each player's points total is affected according to the payoff for those choices.

进行的顺序如下: 第一个玩家(红色)秘密地在两个动作1或2中选择一个; 第二个玩家(蓝色)不知道第一个玩家的选择,秘密地在三个动作 a、 b 或 c 中选择一个,然后,选择公布,每个玩家的总分受到这些选择的收益的影响。

| 模板:Diagonal split header

|-

Example: Red chooses action 2 and Blue chooses action B. When the payoff is allocated, Red gains 20 points and Blue loses 20 points.

例如: 玩家红选择操作2,玩家蓝选择操作B。当回报被分配时,红色获得20点,蓝色失去20点。

! 模板:Red

| 模板:Diagonal split header

In this example game, both players know the payoff matrix and attempt to maximize the number of their points. Red could reason as follows: "With action 2, I could lose up to 20 points and can win only 20, and with action 1 I can lose only 10 but can win up to 30, so action 1 looks a lot better." With similar reasoning, Blue would choose action C. If both players take these actions, Red will win 20 points. If Blue anticipates Red's reasoning and choice of action 1, Blue may choose action B, so as to win 10 points. If Red, in turn, anticipates this trick and goes for action 2, this wins Red 20 points.

在这个例子中,两个玩家都知道收益矩阵,并试图最大化他们的分数。红队的理由如下: “在第二场比赛中,我可能输掉20分,只能赢20分,而在第一场比赛中,我只能输掉10分,但可以赢得30分,所以第一场比赛看起来要好得多。”根据类似的推理,蓝方会选择动作 c。如果两个玩家都采取这些动作,红方会赢得20分。如果蓝色预料到红色的推理和行动1的选择,蓝色可能会选择行动 b,从而赢得10点。如果红色,反过来,预测到这个计策,并选择行动2,这将为红色赢得20点。

| 模板:Diagonal split header

| 模板:Diagonal split header

Émile Borel and John von Neumann had the fundamental insight that probability provides a way out of this conundrum. Instead of deciding on a definite action to take, the two players assign probabilities to their respective actions, and then use a random device which, according to these probabilities, chooses an action for them. Each player computes the probabilities so as to minimize the maximum expected point-loss independent of the opponent's strategy. This leads to a linear programming problem with the optimal strategies for each player. This minimax method can compute probably optimal strategies for all two-player zero-sum games.

Émile BorelJohn von Neumann的基本见解是概率提供了一种解决这个难题的方法。这两个玩家没有决定要采取的明确行动,而是给他们各自的行动分配概率,然后使用一个随机装置,根据这些概率,为他们选择一个行动。每个玩家计算概率,以使最大预期点损失最小化,与对手的策略无关。这就导致了一个线性规划问题,每个参与者的最优策略。这种极大极小方法可以计算所有两人零和博弈的可能最优策略。 |}


For the example given above, it turns out that Red should choose action 1 with probability and action 2 with probability , and Blue should assign the probabilities 0, and to the three actions A, B, and C. Red will then win points on average per game.

对于上面给出的例子,结果表明,红色应该选择行动1的概率和行动2的概率,和蓝色应该将概率0,分配给三个行动A,B,C。红色将赢得平均每场比赛的分数。

 --Miyasaki讨论)这里原文应该有缺漏。



The order of play proceeds as follows: The first player (red) chooses in secret one of the two actions 1 or 2; the second player (blue), unaware of the first player's choice, chooses in secret one of the three actions A, B or C. Then, the choices are revealed and each player's points total is affected according to the payoff for those choices.

进行的顺序如下:第一个玩家(红色)秘密选择两个动作1或2中的一个;第二个玩家(蓝色)不知道第一个玩家的选择,秘密地选择A、B或C三个动作中的一个。然后,这些选择被揭示出来,并且每个玩家的积分总和会根据这些选择的回报而受到影响。

The Nash equilibrium for a two-player, zero-sum game can be found by solving a linear programming problem. Suppose a zero-sum game has a payoff matrix where element }} is the payoff obtained when the minimizing player chooses pure strategy and the maximizing player chooses pure strategy (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column). Assume every element of is positive. The game will have at least one Nash equilibrium. The Nash equilibrium can be found (Raghavan 1994, p. 740) by solving the following linear program to find a vector :

一个双人零和游戏的纳什均衡点可以通过解决一个线性规划问题来找到。假设一个零和对策有一个支付矩阵,其中元素}是最小化对策者选择纯策略而最大化对策者选择纯策略(即最小化对策者选择纯策略)所获得的支付。试图最小化回报的参与人选择行,而试图最大化回报的参与人选择列)。假设元素的每个元素都是正的。这个游戏至少有一个纳什均衡点。可以通过解决下面的线性程序找到一个向量来找到纳什均衡点:

Example: Red chooses action 2 and Blue chooses action B. When the payoff is allocated, Red gains 20 points and Blue loses 20 points.

例如:红色选择动作2,蓝色选择动作B。分配收益时,红色获得20点,蓝色失去20点。


The first constraint says each element of the vector must be nonnegative, and the second constraint says each element of the vector must be at least 1. For the resulting vector, the inverse of the sum of its elements is the value of the game. Multiplying by that value gives a probability vector, giving the probability that the maximizing player will choose each of the possible pure strategies.

第一个约束表示向量的每个元素必须是非负的,第二个约束表示向量的每个元素必须至少是1。对于得到的向量,其元素和的倒数是游戏的值。乘以这个值得到一个概率向量,给出了最大化的玩家选择每个可能的纯策略的概率。

Solving 解答

If the game matrix does not have all positive elements, simply add a constant to every element that is large enough to make them all positive. That will increase the value of the game by that constant, and will have no effect on the equilibrium mixed strategies for the equilibrium.

如果游戏矩阵不具备所有的正元素,只要在每个元素上加一个足够大的常数,使得它们都是正的。这个常数会增加游戏的价值,对均衡的混合策略没有影响。

The Nash equilibrium for a two-player, zero-sum game can be found by solving a linear programming problem. Suppose a zero-sum game has a payoff matrix M where element M模板:Sub is the payoff obtained when the minimizing player chooses pure strategy i and the maximizing player chooses pure strategy j (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column). Assume every element of M is positive. The game will have at least one Nash equilibrium. The Nash equilibrium can be found (Raghavan 1994, p. 740) by solving the following linear program to find a vector u:

一个两人零和博弈的纳什均衡可以通过求解一个线性规划问题得到。假设一个零和博弈有一个支付矩阵M,其中元素M模板:Sub是当最小化的玩家选择纯策略 i而最大化的玩家选择纯策略 j 时获得的收益(即,试图最小化收益的玩家选择行,而试图最大化收益的玩家选择列)。假设 M的每个元素都是正的。博弈至少有一个纳什均衡。纳什均衡可以通过求解以下线性规划找到向量 u来找到(Raghavan 1994,p.740):

The equilibrium mixed strategy for the minimizing player can be found by solving the dual of the given linear program. Or, it can be found by using the above procedure to solve a modified payoff matrix which is the transpose and negation of (adding a constant so it's positive), then solving the resulting game.

通过求解给定线性规划的对偶问题,可以找到最小化问题的均衡混合策略。或者,可以用上述方法求解一个修正后的收益矩阵,它是(加一个常数使其为正)的转置和否定,然后求解结果博弈。

Minimize:
最小化:
[math]\displaystyle{ \sum_{i} u_i }[/math]

If all the solutions to the linear program are found, they will constitute all the Nash equilibria for the game. Conversely, any linear program can be converted into a two-player, zero-sum game by using a change of variables that puts it in the form of the above equations. So such games are equivalent to linear programs, in general.

如果找到线性规划的所有解,它们就构成了博弈的所有纳什均衡。相反,任何线性规划可以转换成一个两人,零和博弈使用变量的变化,使其成为上述方程的形式。所以这样的博弈一般等价于线性规划。

Subject to the constraints:

:受限于以下约束:

u ≥ 0
M u ≥ 1.


If avoiding a zero-sum game is an action choice with some probability for players, avoiding is always an equilibrium strategy for at least one player at a zero-sum game. For any two players zero-sum game where a zero-zero draw is impossible or non-credible after the play is started, such as poker, there is no Nash equilibrium strategy other than avoiding the play. Even if there is a credible zero-zero draw after a zero-sum game is started, it is not better than the avoiding strategy. In this sense, it's interesting to find reward-as-you-go in optimal choice computation shall prevail over all two players zero-sum games with regard to starting the game or not.

如果回避零和博弈是一个具有一定概率的行动选择,那么在零和博弈中,至少一个参与者的回避总是一个均衡策略。对于任何一个零和游戏的玩家来说,在游戏开始后零和游戏是不可能的或者不可信的,比如说扑克,除了回避游戏之外没有其他的纳什均衡点策略。即使在零和博弈开始后出现了可信的零比零平局,这也不比回避策略好。从这个意义上说,有趣的是,在最优选择计算中找到随走随奖在开始与否的问题上将比所有所有双人零和游戏。

The first constraint says each element of the u vector must be nonnegative, and the second constraint says each element of the M u vector must be at least 1. For the resulting u vector, the inverse of the sum of its elements is the value of the game. Multiplying u by that value gives a probability vector, giving the probability that the maximizing player will choose each of the possible pure strategies.

第一个约束说明u向量的每个元素都必须是非负的,第二个约束要求M u向量的每个元素必须至少为1。对于得到的u 向量,其元素和的倒数就是博弈的值。将 u乘以这个值就得到了一个概率向量,给出了最大化的玩家选择每个可能的纯策略的概率。

If the game matrix does not have all positive elements, simply add a constant to every element that is large enough to make them all positive. That will increase the value of the game by that constant, and will have no effect on the equilibrium mixed strategies for the equilibrium.

如果博弈矩阵没有所有的正元素,只需在每个元素上添加一个常量,该元素足够大,足以使它们都是正的。这将使博弈值增加该常数,并且不会对均衡的均衡混合策略产生影响。

The equilibrium mixed strategy for the minimizing player can be found by solving the dual of the given linear program. Or, it can be found by using the above procedure to solve a modified payoff matrix which is the transpose and negation of M (adding a constant so it's positive), then solving the resulting game.

通过求解给定线性规划的对偶问题,可以找到最小化博弈者的均衡混合策略。或者,也可以通过使用上述过程来求解修正的支付矩阵,即 M的转置和否定(添加一个常数使其为正),然后求解结果博弈。


If all the solutions to the linear program are found, they will constitute all the Nash equilibria for the game. Conversely, any linear program can be converted into a two-player, zero-sum game by using a change of variables that puts it in the form of the above equations. So such games are equivalent to linear programs, in general.[citation needed]

如果找到线性规划的所有解,它们将构成博弈的所有 纳什均衡。相反,任何线性程序都可以通过使用变量上述方程形式的变化,将其转换为两人零和博弈。所以,一般来说,这种游戏相当于线性程序。[citation needed]

Universal solution通解

It has been theorized by Robert Wright in his book Nonzero: The Logic of Human Destiny, that society becomes increasingly non-zero-sum as it becomes more complex, specialized, and interdependent.

Robert Wright在他的《非零: 人类命运的逻辑》一书中提出了这样的理论: 当社会变得越来越复杂、专门化和相互依存时,它就会变得越来越非零和。

If avoiding a zero-sum game is an action choice with some probability for players, avoiding is always an equilibrium strategy for at least one player at a zero-sum game. For any two players zero-sum game where a zero-zero draw is impossible or non-credible after the play is started, such as poker, there is no Nash equilibrium strategy other than avoiding the play. Even if there is a credible zero-zero draw after a zero-sum game is started, it is not better than the avoiding strategy. In this sense, it's interesting to find reward-as-you-go in optimal choice computation shall prevail over all two players zero-sum games with regard to starting the game or not.[4]

如果避免零和博弈对玩家来说是一种有一定概率的行为选择,那么在零和博弈中,回避总是至少一个参与者的均衡策略。对于任何两个玩家的零和游戏,在游戏开始后零-零平局是不可能或不可信的,例如扑克,没有纳什均衡策略,除非不做游戏。即使在零和博弈开始后出现了可信的零-零平局,也不比回避策略好。从这个意义上说,有趣的是,在最优选择计算中,在开始游戏或不开始游戏时,最佳选择计算应优先于所有两个玩家的零和博弈[5]




In 1944, John von Neumann and Oskar Morgenstern proved that any non-zero-sum game for n players is equivalent to a zero-sum game with n + 1 players; the (n + 1)th player representing the global profit or loss.

1944年,John von Neumann和Oskar Morgenstern证明了 n 个玩家的任何非零和博弈等价于 n + 1个玩家的零和博弈,第(n + 1)个玩家代表全球的盈亏。


The most common or simple example from the subfield of social psychology is the concept of "social traps". In some cases pursuing individual personal interest can enhance the collective well-being of the group, but in other situations all parties pursuing personal interest results in mutually destructive behavior.

社会心理学子领域中最常见或最简单的例子是“社会陷阱”的概念。在某些情况下,追求个人利益可以增进群体的集体福祉,但在其他情况下,追求个人利益的各方都会导致相互破坏的行为。

Complexity 复杂性

It has been theorized by Robert Wright in his book Nonzero: The Logic of Human Destiny, that society becomes increasingly non-zero-sum as it becomes more complex, specialized, and interdependent.

Robert Wright在他的著作“非零:人类命运的逻辑”中提出,随着社会变得更加复杂、专业化和相互依存,社会变得越来越非零和。


Extensions 扩展

In psychology, zero-sum thinking refers to the perception that a situation is like a zero-sum game, where one person's gain is another's loss.

在心理学中,零和思维指的是这样一种感觉,即某种情况就像一个零和游戏,一个人的得到就是另一个人的损失。

In 1944, John von Neumann and Oskar Morgenstern proved that any non-zero-sum game for n players is equivalent to a zero-sum game with n + 1 players; the (n + 1)th player representing the global profit or loss.[6]

1944年,John von NeumannOskar Morgenstern证明了“n”玩家的任何非零和游戏都等价于一个“n”玩家+1玩家的零和游戏,即第(n + 1)th 个玩家代表全球盈亏。

Misunderstandings 争议问题

Zero-sum games and particularly their solutions are commonly misunderstood by critics of game theory, usually with respect to the independence and rationality of the players, as well as to the interpretation of utility functions. Furthermore, the word "game" does not imply the model is valid only for recreational games.[1]

零和博弈,尤其是它们的解决方案经常被博弈论的批评者误解,通常是关于参与者的独立性和理性,以及对效用函数的解释。此外,“游戏”一词并不意味着该模型仅对娱乐游戏有效[1]

Politics is sometimes called zero sum.[7][8][9]

政治有时被称为零和。[10][11][12]

Zero-sum thinking 零和思维

In psychology, zero-sum thinking refers to the perception that a situation is like a zero-sum game, where one person's gain is another's loss.

在心理学中,零和思维指的是一种感觉,即感觉某情形就像一场零和博弈,一个人的收益就是另一个人的损失。

See also又及

模板:Col div

References

  1. 1.0 1.1 1.2 Ken Binmore (2007). Playing for real: a text on game theory. Oxford University Press US. ISBN 978-0-19-530057-4. https://books.google.com/?id=eY0YhSk9ujsC. , chapters 1 & 7
  2. Bowles, Samuel (2004). Microeconomics: Behavior, Institutions, and Evolution. Princeton University Press. pp. 33–36. ISBN 0-691-09163-3. https://archive.org/details/microeconomicsbe00bowl. 
  3. Wenliang Wang (2015). Pooling Game Theory and Public Pension Plan. . Chapter 1 and Chapter 4.
  4. Wenliang Wang (2015). Pooling Game Theory and Public Pension Plan. . Chapter 4.
  5. Wenliang Wang (2015). Pooling Game Theory and Public Pension Plan. . Chapter 4.
  6. Theory of Games and Economic Behavior. Princeton University Press (1953). June 25, 2005. ISBN 9780691130613. https://press.princeton.edu/titles/7802.html. Retrieved 2018-02-25. 
  7. Rubin, Jennifer (2013-10-04). "The flaw in zero sum politics". The Washington Post. Retrieved 2017-03-08.
  8. "Lexington: Zero-sum politics". The Economist. 2014-02-08. Retrieved 2017-03-08.
  9. 模板:Cite dictionary
  10. Rubin, Jennifer (2013-10-04). "The flaw in zero sum politics". The Washington Post. Retrieved 2017-03-08.
  11. "Lexington: Zero-sum politics". The Economist. 2014-02-08. Retrieved 2017-03-08.
  12. 模板:Cite dictionary


Further reading 拓展阅读

  • Misstating the Concept of Zero-Sum Games within the Context of Professional Sports Trading Strategies, series Pardon the Interruption (2010-09-23) ESPN, created by Tony Kornheiser and Michael Wilbon, performance by Bill Simmons