更改

添加2,858字节 、 2020年10月31日 (六) 11:23
第96行: 第96行:  
!
 
!
   −
== Solution ==
+
== Solution 方案==
    
!  
 
!  
第109行: 第109行:     
For two-player finite zero-sum games, the different [[game theory|game theoretic]] [[solution concept]]s of [[Nash equilibrium]], [[minimax]], and [[maximin (decision theory)|maximin]] all give the same solution. If the players are allowed to play a [[mixed strategy]], the game always has an equilibrium.
 
For two-player finite zero-sum games, the different [[game theory|game theoretic]] [[solution concept]]s of [[Nash equilibrium]], [[minimax]], and [[maximin (decision theory)|maximin]] all give the same solution. If the players are allowed to play a [[mixed strategy]], the game always has an equilibrium.
 +
 +
对于两人有限零和对策,[[Nash均衡]], [[最小最大]]和[[最大最小(决策理论)| 最大最小]]的不同的[[博弈论|博弈论]][[解概念]]都给出了相同的解。如果允许玩家玩一个[[混合策略]],游戏总是有一个平衡点。
    
!  
 
!  
第120行: 第122行:  
会发生什么
 
会发生什么
   −
=== Example ===
+
=== Example 举例===
    
| ||white}}
 
| ||white}}
第138行: 第140行:  
|-
 
|-
   −
|+ align=bottom |''A zero-sum game''
+
|+ align=bottom |''A zero-sum game''“一个零和博弈”
    
!  
 
!  
第206行: 第208行:  
Émile Borel and John von Neumann had the fundamental insight that probability provides a way out of this conundrum. Instead of deciding on a definite action to take, the two players assign probabilities to their respective actions, and then use a random device which, according to these probabilities, chooses an action for them. Each player computes the probabilities so as to minimize the maximum expected point-loss independent of the opponent's strategy. This leads to a linear programming problem with the optimal strategies for each player. This minimax method can compute probably optimal strategies for all two-player zero-sum games.
 
Émile Borel and John von Neumann had the fundamental insight that probability provides a way out of this conundrum. Instead of deciding on a definite action to take, the two players assign probabilities to their respective actions, and then use a random device which, according to these probabilities, chooses an action for them. Each player computes the probabilities so as to minimize the maximum expected point-loss independent of the opponent's strategy. This leads to a linear programming problem with the optimal strategies for each player. This minimax method can compute probably optimal strategies for all two-player zero-sum games.
   −
和约翰·冯·诺伊曼的基本观点是概率提供了一条解决这个难题的途径。两个参与者不是决定采取什么明确的行动,而是分配各自行动的可能性,然后使用一个随机设备,根据这些可能性,为他们选择一个行动。每个参与人计算概率,以最小化最大期望点损失独立于对手的策略。这就导致了每个玩家的最优策略的线性规划问题。这种极大极小方法可以计算出所有两人零和对策的最优策略。
+
和约翰·冯·诺伊曼的基本观点是概率提供了一条解决这个难题的途径。两个参与者不是决定采取什么明确的行动,而是分配各自行动的可能性,然后使用一个随机设备,根据这些可能性,为他们选择一个行动。每个参与人计算概率,以最小化最大期望点损失独立于对手的策略。这就导致了每个玩家的最优策略的线性规划问题。这种'''<font color="#ff8000"> 极大极小法Minimax method</font>'''可以计算出所有两人零和游戏的最优策略。
    
|}
 
|}
第222行: 第224行:  
The order of play proceeds as follows: The first player (red) chooses in secret one of the two actions 1 or 2; the second player (blue), unaware of the first player's choice, chooses in secret one of the three actions A, B or C. Then, the choices are revealed and each player's points total is affected according to the payoff for those choices.
 
The order of play proceeds as follows: The first player (red) chooses in secret one of the two actions 1 or 2; the second player (blue), unaware of the first player's choice, chooses in secret one of the three actions A, B or C. Then, the choices are revealed and each player's points total is affected according to the payoff for those choices.
   −
 
+
游戏顺序如下:第一个玩家(红色)秘密选择两个动作1或2中的一个;第二个玩家(蓝色)不知道第一个玩家的选择,秘密地选择A、B或C三个动作中的一个。然后,这些选择被揭示出来,并且每个玩家的积分总和会根据这些选择的回报而受到影响。
    
The Nash equilibrium for a two-player, zero-sum game can be found by solving a linear programming problem.  Suppose a zero-sum game has a payoff matrix  where element }} is the payoff obtained when the minimizing player chooses pure strategy  and the maximizing player chooses pure strategy  (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column).  Assume every element of  is positive.  The game will have at least one Nash equilibrium.  The Nash equilibrium can be found (Raghavan 1994, p.&nbsp;740) by solving the following linear program to find a vector :
 
The Nash equilibrium for a two-player, zero-sum game can be found by solving a linear programming problem.  Suppose a zero-sum game has a payoff matrix  where element }} is the payoff obtained when the minimizing player chooses pure strategy  and the maximizing player chooses pure strategy  (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column).  Assume every element of  is positive.  The game will have at least one Nash equilibrium.  The Nash equilibrium can be found (Raghavan 1994, p.&nbsp;740) by solving the following linear program to find a vector :
第230行: 第232行:  
''Example: Red chooses action 2 and Blue chooses action B. When the payoff is allocated, Red gains 20 points and Blue loses 20 points.''
 
''Example: Red chooses action 2 and Blue chooses action B. When the payoff is allocated, Red gains 20 points and Blue loses 20 points.''
   −
 
+
“例如:红色选择动作2,蓝色选择动作B。当分配到回报时,红色获得20点,蓝色失去20点。”
    
  Minimize:
 
  Minimize:
第237行: 第239行:     
In this example game, both players know the payoff matrix and attempt to maximize the number of their points. Red could reason as follows: "With action 2, I could lose up to 20 points and can win only 20, and with action 1 I can lose only 10 but can win up to 30, so action 1 looks a lot better." With similar reasoning, Blue would choose action C. If both players take these actions, Red will win 20 points. If Blue anticipates Red's reasoning and choice of action 1, Blue may choose action B, so as to win 10 points. If Red, in turn, anticipates this trick and goes for action 2, this wins Red 20 points.
 
In this example game, both players know the payoff matrix and attempt to maximize the number of their points. Red could reason as follows: "With action 2, I could lose up to 20 points and can win only 20, and with action 1 I can lose only 10 but can win up to 30, so action 1 looks a lot better." With similar reasoning, Blue would choose action C. If both players take these actions, Red will win 20 points. If Blue anticipates Red's reasoning and choice of action 1, Blue may choose action B, so as to win 10 points. If Red, in turn, anticipates this trick and goes for action 2, this wins Red 20 points.
 +
 +
在这个示例游戏中,两个玩家都知道收益矩阵,并试图最大化他们的点数。瑞德可以这样解释:“在第二个动作中,我最多可以输20分,但只能赢20分,而在第一个动作中,我只能输10分,但最多可以赢30分,所以第一个动作看起来好多了。”根据类似的推理,蓝队会选择动作C。如果两个选手都采取这些动作,红色将赢得20分。如果蓝色预测到红色的推理和行动1的选择,蓝色可以选择行动B,从而赢得10分。如果雷德,反过来,预期这一技巧和行动2,这赢得红20分。
    
  <math>\sum_{i} u_i</math>
 
  <math>\sum_{i} u_i</math>
第250行: 第254行:  
[[Émile Borel]] and [[John von Neumann]] had the fundamental insight that [[probability]] provides a way out of this conundrum. Instead of deciding on a definite action to take, the two players assign probabilities to their respective actions, and then use a random device which, according to these probabilities, chooses an action for them. Each player computes the probabilities so as to minimize the maximum [[expected value|expected]] point-loss independent of the opponent's strategy. This leads to a [[linear programming]] problem with the optimal strategies for each player. This [[minimax]] method can compute probably optimal strategies for all two-player zero-sum games.
 
[[Émile Borel]] and [[John von Neumann]] had the fundamental insight that [[probability]] provides a way out of this conundrum. Instead of deciding on a definite action to take, the two players assign probabilities to their respective actions, and then use a random device which, according to these probabilities, chooses an action for them. Each player computes the probabilities so as to minimize the maximum [[expected value|expected]] point-loss independent of the opponent's strategy. This leads to a [[linear programming]] problem with the optimal strategies for each player. This [[minimax]] method can compute probably optimal strategies for all two-player zero-sum games.
   −
 
+
[[Émile Borel]]和[[John von Neumann]]的基本见解是[[概率]]提供了一种解决这个难题的方法。这两个玩家没有决定要采取的明确行动,而是给他们各自的行动分配概率,然后使用一个随机装置,根据这些概率,为他们选择一个行动。每个玩家计算概率,以使最大[[期望值|预期]]点损失最小化,与对手的策略无关。这就导致了一个[[线性规划]]问题,每个参与者的最优策略。这种[[极大极小]]方法可以计算所有两人零和博弈的可能最优策略。
    
  .
 
  .
第258行: 第262行:  
For the example given above, it turns out that Red should choose action 1 with probability {{sfrac|4|7}} and action 2 with probability {{sfrac|3|7}}, and Blue should assign the probabilities 0, {{sfrac|4|7}}, and {{sfrac|3|7}} to the three actions A, B, and C. Red will then win {{sfrac|20|7}} points on average per game.
 
For the example given above, it turns out that Red should choose action 1 with probability {{sfrac|4|7}} and action 2 with probability {{sfrac|3|7}}, and Blue should assign the probabilities 0, {{sfrac|4|7}}, and {{sfrac|3|7}} to the three actions A, B, and C. Red will then win {{sfrac|20|7}} points on average per game.
   −
 
+
上面给出的例子中,红色应该选择行动1,概率 {{sfrac|4|7}} 和行动2,概率{{sfrac|3|7}}, 而蓝色应将概率0,{{sfrac|4|7}}, 和 {{sfrac|3|7}} 分配给A、B、C三个行动。红色平均每场比赛将赢得{sfrac | 20 | 7}点。
    
The first constraint says each element of the  vector must be nonnegative, and the second constraint says each element of the  vector must be at least 1.  For the resulting  vector, the inverse of the sum of its elements is the value of the game.  Multiplying  by that value gives a probability vector, giving the probability that the maximizing player will choose each of the possible pure strategies.
 
The first constraint says each element of the  vector must be nonnegative, and the second constraint says each element of the  vector must be at least 1.  For the resulting  vector, the inverse of the sum of its elements is the value of the game.  Multiplying  by that value gives a probability vector, giving the probability that the maximizing player will choose each of the possible pure strategies.
第264行: 第268行:  
第一个约束表示向量的每个元素必须是非负的,第二个约束表示向量的每个元素必须至少是1。对于得到的向量,其元素和的倒数是游戏的值。乘以这个值得到一个概率向量,给出了最大化的玩家选择每个可能的纯策略的概率。
 
第一个约束表示向量的每个元素必须是非负的,第二个约束表示向量的每个元素必须至少是1。对于得到的向量,其元素和的倒数是游戏的值。乘以这个值得到一个概率向量,给出了最大化的玩家选择每个可能的纯策略的概率。
   −
=== Solving ===
+
=== Solving 解答===
      第274行: 第278行:  
The [[Nash equilibrium]] for a two-player, zero-sum game can be found by solving a [[linear programming]] problem.  Suppose a zero-sum game has a payoff matrix {{mvar|M}} where element {{math|''M''{{sub|''i'',''j''}}}} is the payoff obtained when the minimizing player chooses pure strategy {{mvar|i}} and the maximizing player chooses pure strategy {{mvar|j}} (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column).  Assume every element of {{mvar|M}} is positive.  The game will have at least one Nash equilibrium.  The Nash equilibrium can be found (Raghavan 1994, p.&nbsp;740) by solving the following linear program to find a vector {{mvar|u}}:
 
The [[Nash equilibrium]] for a two-player, zero-sum game can be found by solving a [[linear programming]] problem.  Suppose a zero-sum game has a payoff matrix {{mvar|M}} where element {{math|''M''{{sub|''i'',''j''}}}} is the payoff obtained when the minimizing player chooses pure strategy {{mvar|i}} and the maximizing player chooses pure strategy {{mvar|j}} (i.e. the player trying to minimize the payoff chooses the row and the player trying to maximize the payoff chooses the column).  Assume every element of {{mvar|M}} is positive.  The game will have at least one Nash equilibrium.  The Nash equilibrium can be found (Raghavan 1994, p.&nbsp;740) by solving the following linear program to find a vector {{mvar|u}}:
   −
 
+
一个两人零和博弈的[[纳什均衡]]可以通过求解一个[[线性规划]]问题得到。假设一个零和博弈有一个支付矩阵{mvar | M}},其中元素{math | M'{sub |“i”,“j''}}}}是当最小化的玩家选择纯策略{mvar | i}}而最大化的玩家选择纯策略{mvar | j}}时获得的收益(即,试图最小化收益的玩家选择行,而试图最大化收益的玩家选择列)。假设{mvar | M}的每个元素都是正的。博弈至少有一个纳什均衡。纳什均衡可以通过求解以下线性规划找到向量{mvar | u}来找到(Raghavan 1994,p.740):
    
The equilibrium mixed strategy for the minimizing player can be found by solving the dual of the given linear program.  Or, it can be found by using the above procedure to solve a modified payoff matrix which is the transpose and negation of  (adding a constant so it's positive), then solving the resulting game.
 
The equilibrium mixed strategy for the minimizing player can be found by solving the dual of the given linear program.  Or, it can be found by using the above procedure to solve a modified payoff matrix which is the transpose and negation of  (adding a constant so it's positive), then solving the resulting game.
第361行: 第365行:     
政治有时被称为零和游戏。
 
政治有时被称为零和游戏。
  −
  −
  −
      
== Extensions ==
 
== Extensions ==
561

个编辑