更改

跳到导航 跳到搜索
大小无更改 、 2021年1月25日 (一) 19:42
无编辑摘要
第101行: 第101行:  
An extended "iterated" version of the game also exists. In this version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.
 
An extended "iterated" version of the game also exists. In this version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.
   −
一个扩展的<font color="#ff8000">迭代iterated</font>版本的博弈由此衍生出来。在这个版本中,经典博弈会在在同一组囚犯之间重复进行,他们不断有机会为了之前的决定对其他囚犯进行惩罚。如果参与者知道博弈的次数,那么(通过<font color="#ff8000">逆向归纳法 backward induction </font>)两个经典的理性的玩家就会因为和在单次博弈中相同的原因反复背叛对方。在无限次或未知次数的博弈中,没有固定的最优策略,因而,举办囚徒困境竞赛来竞争和检验这种情况下的算法。<ref>{{cite journal|url = https://egtheory.wordpress.com/2015/03/02/ipd/|title = Short history of iterated prisoner's dilemma tournaments|date = March 2, 2015|access-date = February 8, 2016|journal = Journal of Conflict Resolution|volume = 24|issue = 3|pages = 379–403|last = Kaznatcheev|first = Artem|doi = 10.1177/002200278002400301}}</ref>
+
一个扩展的<font color="#ff8000">重复iterated</font>版本的博弈由此衍生出来。在这个版本中,经典博弈会在在同一组囚犯之间重复进行,他们不断有机会为了之前的决定对其他囚犯进行惩罚。如果参与者知道博弈的次数,那么(通过<font color="#ff8000">逆向归纳法 backward induction </font>)两个经典的理性的玩家就会因为和在单次博弈中相同的原因反复背叛对方。在无限次或未知次数的博弈中,没有固定的最优策略,因而,举办囚徒困境竞赛来竞争和检验这种情况下的算法。<ref>{{cite journal|url = https://egtheory.wordpress.com/2015/03/02/ipd/|title = Short history of iterated prisoner's dilemma tournaments|date = March 2, 2015|access-date = February 8, 2016|journal = Journal of Conflict Resolution|volume = 24|issue = 3|pages = 379–403|last = Kaznatcheev|first = Artem|doi = 10.1177/002200278002400301}}</ref>
      第109行: 第109行:  
The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activitie  
 
The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activitie  
 
                      
 
                      
囚徒困境博弈可以作为许多现实中涉及合作行为的模型。在非正式用法中,“囚徒困境”一词可适用于不严格符合经典或迭代博弈的形式标准的情况: 例如,两个实体可以从合作中获得巨大利益或者会因为合作失败而遭受损失,但发现协调他们的活动很困难或者代价昂贵(并非是不可能的)。
+
囚徒困境博弈可以作为许多现实中涉及合作行为的模型。在非正式用法中,“囚徒困境”一词可适用于不严格符合经典或重复博弈的形式标准的情况: 例如,两个实体可以从合作中获得巨大利益或者会因为合作失败而遭受损失,但发现协调他们的活动很困难或者代价昂贵(并非是不可能的)。
      第395行: 第395行:  
Note that  (i.e. ) which qualifies the donation game to be an iterated game (see next section).
 
Note that  (i.e. ) which qualifies the donation game to be an iterated game (see next section).
   −
请注意{{tmath|2R>T+S}}(即{{tmath|2(b-c)>b-c}})这使得捐赠博弈成为一个迭代博弈(见下一节)。
+
请注意{{tmath|2R>T+S}}(即{{tmath|2(b-c)>b-c}})这使得捐赠博弈成为一个重复博弈(见下一节)。
      第405行: 第405行:     
==The iterated prisoner's dilemma==
 
==The iterated prisoner's dilemma==
<font color="#ff8000">迭代囚徒困境 iterated prisoner's dilemma </font> {{more citations needed section|date=November 2012}}
+
<font color="#ff8000">重复囚徒困境 iterated prisoner's dilemma </font> {{more citations needed section|date=November 2012}}
    
If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.
 
If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.
第411行: 第411行:  
If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.
 
If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.
   −
如果两个参与者连续进行多次囚徒困境博弈,他们记住对手先前的行动并相应地改变策略,这种博弈被称为迭代囚徒困境。
+
如果两个参与者连续进行多次囚徒困境博弈,他们记住对手先前的行动并相应地改变策略,这种博弈被称为重复囚徒困境。
      第419行: 第419行:  
In addition to the general form above, the iterative version also requires that , to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.
 
In addition to the general form above, the iterative version also requires that , to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.
   −
除了上面的一般形式之外,迭代版本还要求{{tmath|2R > T + S}},防止交替合作和背叛比相互合作有更大的回报。
+
除了上面的一般形式之外,重复版本还要求{{tmath|2R > T + S}},防止交替合作和背叛比相互合作有更大的回报。
      第427行: 第427行:  
The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "peace-war game".
 
The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "peace-war game".
   −
迭代囚徒困境博弈是人类合作与信任的理论基础。假设博弈可以为两个需要信任的人之间的交易建模,那么群体中的合作行为也可以由多个参与者迭代的博弈模型来建模。因此,这些年来,它吸引了许多学者。1975年,葛夫曼 Grofman和普尔 Pool估计专门撰写有关该领域的学术文章超过2000篇。迭代囚徒困境也被称为“和平-战争博弈”。<ref name = Shy>{{cite book | title= Industrial Organization: Theory and Applications | publisher=Massachusetts Institute of Technology Press | first1= Oz | last1=Shy |url=https://books.google.com/?id=tr4CjJ5LlRcC&pg=PR13&dq=industrial+organization+theory+and+applications  | year=1995 | isbn=978-0262193665 | accessdate=February 27, 2013}}</ref>
+
重复囚徒困境博弈是人类合作与信任的理论基础。假设博弈可以为两个需要信任的人之间的交易建模,那么群体中的合作行为也可以由多个参与者重复的博弈模型来建模。因此,这些年来,它吸引了许多学者。1975年,葛夫曼 Grofman和普尔 Pool估计专门撰写有关该领域的学术文章超过2000篇。重复囚徒困境也被称为“和平-战争博弈”。<ref name = Shy>{{cite book | title= Industrial Organization: Theory and Applications | publisher=Massachusetts Institute of Technology Press | first1= Oz | last1=Shy |url=https://books.google.com/?id=tr4CjJ5LlRcC&pg=PR13&dq=industrial+organization+theory+and+applications  | year=1995 | isbn=978-0262193665 | accessdate=February 27, 2013}}</ref>
      第441行: 第441行:  
Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer.  The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.
 
Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer.  The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.
   −
与标准的囚徒困境不同,在迭代囚徒困境中,叛变策略是严重违反直觉的,以至于不能很好地预测人类玩家的行为。然而,在标准的经济理论中,这是唯一正确的答案。具有固定次数 N的迭代囚徒困境中的<font color="#ff8000">超理性 superrational</font>策略是与超理性对手进行合作,在N很大的限制下,实验结果的策略与超理性结果的策略一致,而不是博弈论的理性结果。
+
与标准的囚徒困境不同,在重复囚徒困境中,叛变策略是严重违反直觉的,以至于不能很好地预测人类玩家的行为。然而,在标准的经济理论中,这是唯一正确的答案。具有固定次数 N的重复囚徒困境中的<font color="#ff8000">超理性 superrational</font>策略是与超理性对手进行合作,在N很大的限制下,实验结果的策略与超理性结果的策略一致,而不是博弈论的理性结果。
      第457行: 第457行:  
According to a 2019 experimental study in the American Economic Review which tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always defect, tit-for-tat, and Grim trigger. Which strategy the subjects chose depended on the parameters of the game.
 
According to a 2019 experimental study in the American Economic Review which tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always defect, tit-for-tat, and Grim trigger. Which strategy the subjects chose depended on the parameters of the game.
   −
根据《美国经济评论》于2019年进行的一项实验研究,该实验中通过完美的监控测试了现实中被用在迭代囚徒困境情况下的策略,监测选择的策略总是背叛,针锋相对的和 <font color="#ff8000"> 冷酷触发策略 Grim trigger</font>。受试者选择的策略取决于博弈的参数。<ref>{{Cite journal|last=Dal Bó|first=Pedro|last2=Fréchette|first2=Guillaume R.|date=2019|title=Strategy Choice in the Infinitely Repeated Prisoner's Dilemma|journal=American Economic Review|language=en|volume=109|issue=11|pages=3929–3952|doi=10.1257/aer.20181480|issn=0002-8282}}</ref>
+
根据《美国经济评论》于2019年进行的一项实验研究,该实验中通过完美的监控测试了现实中被用在重复囚徒困境情况下的策略,监测选择的策略总是背叛,针锋相对的和 <font color="#ff8000"> 冷酷触发策略 Grim trigger</font>。受试者选择的策略取决于博弈的参数。<ref>{{Cite journal|last=Dal Bó|first=Pedro|last2=Fréchette|first2=Guillaume R.|date=2019|title=Strategy Choice in the Infinitely Repeated Prisoner's Dilemma|journal=American Economic Review|language=en|volume=109|issue=11|pages=3929–3952|doi=10.1257/aer.20181480|issn=0002-8282}}</ref>
          
===Strategy for the iterated prisoner's dilemma===
 
===Strategy for the iterated prisoner's dilemma===
迭代囚徒困境下的策略
+
重复囚徒困境下的策略
 
Interest in the iterated prisoner's dilemma (IPD) was kindled by [[Robert Axelrod]] in his book ''[[The Evolution of Cooperation]]'' (1984). In it he reports on a tournament he organized of the ''N'' step prisoner's dilemma (with ''N'' fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.
 
Interest in the iterated prisoner's dilemma (IPD) was kindled by [[Robert Axelrod]] in his book ''[[The Evolution of Cooperation]]'' (1984). In it he reports on a tournament he organized of the ''N'' step prisoner's dilemma (with ''N'' fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.
    
Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.
 
Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.
   −
罗伯特·阿克塞尔罗德 Robert Axelrod在他的著作《合作的进化》(1984)中激起了人们对迭代囚徒困境(IPD)的兴趣。在这篇文章中,他报道了自己组织的固定N次囚徒困境的比赛,参与者必须一次又一次地选择他们的共同策略,并且要记住他们之前的遭遇。阿克塞尔罗德邀请世界各地的学术界同仁设计计算机策略来参加IPD锦标赛。输入的程序在算法复杂性、最初敌意、宽恕能力等方面有很大差异。
+
罗伯特·阿克塞尔罗德 Robert Axelrod在他的著作《合作的进化》(1984)中激起了人们对重复囚徒困境(IPD)的兴趣。在这篇文章中,他报道了自己组织的固定N次囚徒困境的比赛,参与者必须一次又一次地选择他们的共同策略,并且要记住他们之前的遭遇。阿克塞尔罗德邀请世界各地的学术界同仁设计计算机策略来参加IPD锦标赛。输入的程序在算法复杂性、最初敌意、宽恕能力等方面有很大差异。
      第483行: 第483行:  
The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.
 
The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.
   −
最终获胜的决定性策略是针锋相对策略,这是阿纳托尔·拉波波特 Anatol Rapoport开发并参加比赛的策略。这是所有参赛程序中最简单的一个,只有四行 BASIC 语言,并且赢得了比赛。策略很简单,就是在游戏的第一次迭代中进行合作; 在此之后,玩家将执行做他的对手在前一步中所做的事情。根据具体情况,一个稍微好一点的策略可以是“带着宽恕之心针锋相对”。当对手叛变时,在下一次博弈中,玩家有时还是会合作,但概率很小(大约1-5%)。这允许博弈偶尔能从陷入叛变循环中恢复过来。确切的概率取决于对手的安排。
+
最终获胜的决定性策略是针锋相对策略,这是阿纳托尔·拉波波特 Anatol Rapoport开发并参加比赛的策略。这是所有参赛程序中最简单的一个,只有四行 BASIC 语言,并且赢得了比赛。策略很简单,就是在游戏的第一次重复中进行合作; 在此之后,玩家将执行做他的对手在前一步中所做的事情。根据具体情况,一个稍微好一点的策略可以是“带着宽恕之心针锋相对”。当对手叛变时,在下一次博弈中,玩家有时还是会合作,但概率很小(大约1-5%)。这允许博弈偶尔能从陷入叛变循环中恢复过来。确切的概率取决于对手的安排。
      第525行: 第525行:  
The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.
 
The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.
   −
对于一次性的囚徒困境博弈,最优(点数最大化)策略就是简单的叛变; 正如上面所说,无论对手的构成如何,这都是正确的。然而,在迭代囚徒困境博弈中,最优策略取决于可能的对手的策略,以及他们对叛变和合作的反应。例如,考虑一个群体,其中每个人每次都会叛变,只有一个人遵循针锋相对的策略。那个人就会由于第一回合的失利而处于轻微的不利地位。在这样一个群体中,个体的最佳策略是每次都叛变。在一定比例的总是选择背叛的玩家和其余组成选择针锋相对策略的玩家的人群中,个人的最佳策略取决于这一比例和博弈的次数。
+
对于一次性的囚徒困境博弈,最优(点数最大化)策略就是简单的叛变; 正如上面所说,无论对手的构成如何,这都是正确的。然而,在重复囚徒困境博弈中,最优策略取决于可能的对手的策略,以及他们对叛变和合作的反应。例如,考虑一个群体,其中每个人每次都会叛变,只有一个人遵循针锋相对的策略。那个人就会由于第一回合的失利而处于轻微的不利地位。在这样一个群体中,个体的最佳策略是每次都叛变。在一定比例的总是选择背叛的玩家和其余组成选择针锋相对策略的玩家的人群中,个人的最佳策略取决于这一比例和博弈的次数。
      第556行: 第556行:  
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches&nbsp;– that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
 
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches&nbsp;– that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
   −
尽管针锋相对被认为是最有力的基本策略,来自英格兰南安普敦大学的一个团队在20周年的迭代囚徒困境竞赛中提出了一个新策略,这个策略被证明比针锋相对更为成功。这种策略依赖于程序之间的串通,以获得单个程序的最高分数。这所大学提交了60个程序,这些程序的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。<ref>{{cite press release|url= http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|publisher=University of Southampton|title=University of Southampton team wins Prisoner's Dilemma competition|date=7 October 2004|url-status=dead|archive-url= https://web.archive.org/web/20140421055745/http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|archive-date=2014-04-21}}</ref>一旦认识建立,一个程序总是合作,另一个程序总是叛变,保证叛变者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛,它会不断地叛变,试图最小化与之竞争的程序的得分。因此,2004年囚徒困境锦标赛的结果显示了南安普敦大学战略位居前三名,尽管它比冷酷战略赢得更少,输的更多。(在囚徒困境锦标赛中,比赛的目的不是“赢”比赛——这一点频繁叛变很容易实现)。此外,即使没有软件策略之间的暗中串通(南安普顿队利用了这一点) ,针锋相对并不总是任何特定锦标赛的绝对赢家; 更准确地说,它是在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中,一个给定的策略可以比针锋相对稍微更好地适应竞争,但是针锋相对更稳健)。这同样适用于带有宽恕变量的针锋相对,和其他最佳策略: 在任何特定的一天,他们可能不会“赢得”一个特定的混合反战略。另一种方法是使用达尔文 Darwinian的<font color="#ff8000"> ESS模拟 ESS simulation</font>。在这样的模拟中,针锋相对几乎总是占主导地位,尽管讨厌的策略会在人群中漂移,因为使用针锋相对策略的人群可以通过非报复性的好策略进行渗透,这反过来使他们容易成为讨厌策略的猎物。理查德·道金斯 Richard Dawkins指出,在这里,没有静态的混合策略会形成一个稳定的平衡,系统将始终在边界之间振荡。这种策略最终在比赛中获得了前三名的成绩,或者是接近垫底的成绩。
+
尽管针锋相对被认为是最有力的基本策略,来自英格兰南安普敦大学的一个团队在20周年的重复囚徒困境竞赛中提出了一个新策略,这个策略被证明比针锋相对更为成功。这种策略依赖于程序之间的串通,以获得单个程序的最高分数。这所大学提交了60个程序,这些程序的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。<ref>{{cite press release|url= http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|publisher=University of Southampton|title=University of Southampton team wins Prisoner's Dilemma competition|date=7 October 2004|url-status=dead|archive-url= https://web.archive.org/web/20140421055745/http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|archive-date=2014-04-21}}</ref>一旦认识建立,一个程序总是合作,另一个程序总是叛变,保证叛变者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛,它会不断地叛变,试图最小化与之竞争的程序的得分。因此,2004年囚徒困境锦标赛的结果显示了南安普敦大学战略位居前三名,尽管它比冷酷战略赢得更少,输的更多。(在囚徒困境锦标赛中,比赛的目的不是“赢”比赛——这一点频繁叛变很容易实现)。此外,即使没有软件策略之间的暗中串通(南安普顿队利用了这一点) ,针锋相对并不总是任何特定锦标赛的绝对赢家; 更准确地说,它是在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中,一个给定的策略可以比针锋相对稍微更好地适应竞争,但是针锋相对更稳健)。这同样适用于带有宽恕变量的针锋相对,和其他最佳策略: 在任何特定的一天,他们可能不会“赢得”一个特定的混合反战略。另一种方法是使用达尔文 Darwinian的<font color="#ff8000"> ESS模拟 ESS simulation</font>。在这样的模拟中,针锋相对几乎总是占主导地位,尽管讨厌的策略会在人群中漂移,因为使用针锋相对策略的人群可以通过非报复性的好策略进行渗透,这反过来使他们容易成为讨厌策略的猎物。理查德·道金斯 Richard Dawkins指出,在这里,没有静态的混合策略会形成一个稳定的平衡,系统将始终在边界之间振荡。这种策略最终在比赛中获得了前三名的成绩,或者是接近垫底的成绩。
      第569行: 第569行:     
===Stochastic iterated prisoner's dilemma===
 
===Stochastic iterated prisoner's dilemma===
随机的迭代囚徒困境
+
随机的重复囚徒困境
      第576行: 第576行:  
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:  <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.
 
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:  <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.
   −
在随机迭代<font color="#ff8000"> 囚徒困境prisoner's dilemma</font>博弈中,策略由“合作概率”来确定。<ref name=Press2012>{{cite journal|last1=Press|first1=WH|last2=Dyson|first2=FJ|title=Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|date=26 June 2012|volume=109|issue=26|pages=10409–13|doi=10.1073/pnas.1206569109|pmid=22615375|pmc=3387070|bibcode=2012PNAS..10910409P}}</ref>在玩家''X''和玩家''Y''之间的遭遇中,''X''‘s的策略由一组与''Y''合作的概率''P''确定,''P''是他们之前遭遇的结果的函数,或者是其中的一些子集。如果''P''只是它们最近遇到次数 ''n''的函数,那么它被称为“记忆-n”策略。我们可以由四个联合概率指定一个记忆-1策略: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>,其中<math>P_{ab}</math>是在当前遭遇中基于先前联合的概率。如果每个概率都是1或0,这种策略称为确定性策略。确定性策略的一个例子是针锋相对策略,写成 p {1,0,1,0} ,其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略,它被写成 p {1,0,0,1} ,在这种策略中,如果 x 获得胜利(即:cc 或 dc),x会做出与上一次遭遇一样的反应 ,但如果失败,x会改变策略(即cd 或 dd)。研究表明,对于任何一种记忆-n 策略,存在一个相应的记忆-1策略,这个策略给出相同的统计结果,因此只需要考虑记忆-1策略。<ref name="Press2012"/>
+
在随机重复<font color="#ff8000"> 囚徒困境prisoner's dilemma</font>博弈中,策略由“合作概率”来确定。<ref name=Press2012>{{cite journal|last1=Press|first1=WH|last2=Dyson|first2=FJ|title=Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|date=26 June 2012|volume=109|issue=26|pages=10409–13|doi=10.1073/pnas.1206569109|pmid=22615375|pmc=3387070|bibcode=2012PNAS..10910409P}}</ref>在玩家''X''和玩家''Y''之间的遭遇中,''X''‘s的策略由一组与''Y''合作的概率''P''确定,''P''是他们之前遭遇的结果的函数,或者是其中的一些子集。如果''P''只是它们最近遇到次数 ''n''的函数,那么它被称为“记忆-n”策略。我们可以由四个联合概率指定一个记忆-1策略: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>,其中<math>P_{ab}</math>是在当前遭遇中基于先前联合的概率。如果每个概率都是1或0,这种策略称为确定性策略。确定性策略的一个例子是针锋相对策略,写成 p {1,0,1,0} ,其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略,它被写成 p {1,0,0,1} ,在这种策略中,如果 x 获得胜利(即:cc 或 dc),x会做出与上一次遭遇一样的反应 ,但如果失败,x会改变策略(即cd 或 dd)。研究表明,对于任何一种记忆-n 策略,存在一个相应的记忆-1策略,这个策略给出相同的统计结果,因此只需要考虑记忆-1策略。<ref name="Press2012"/>
      第604行: 第604行:  
The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated  prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]
 
The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated  prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]
   −
<font color="#ff8000">维恩图 Venn diagram</font>中讨论了<font color="#ff8000">迭代囚徒困境 iterated prisoner's dilemma</font>(IPD)中零决定策略(ZD)、合作策略和背叛策略之间的关系。合作策略总是与其他合作策略相互配合,而背叛策略总是与其他背叛策略相抵触。这两种策略都包都含在强选择下稳健的策略子集,这意味着当它们驻留在一个种群中时,没有选择其他的记忆-1策略来入侵此策略。只有合作策略包含在始终稳健的策略子集,意味着无论选择强项还是弱项,都不会选择其他任何记忆-1策略来入侵和替换此策略。零决定策略和良好的合作策略之间的交集是一组宽松的零决定策略。勒索策略是零决定策略和非稳健背叛策略的交集。针锋相对是合作、背叛和零决定策略的交集。
+
<font color="#ff8000">维恩图 Venn diagram</font>中讨论了<font color="#ff8000">重复囚徒困境 iterated prisoner's dilemma</font>(IPD)中零决定策略(ZD)、合作策略和背叛策略之间的关系。合作策略总是与其他合作策略相互配合,而背叛策略总是与其他背叛策略相抵触。这两种策略都包都含在强选择下稳健的策略子集,这意味着当它们驻留在一个种群中时,没有选择其他的记忆-1策略来入侵此策略。只有合作策略包含在始终稳健的策略子集,意味着无论选择强项还是弱项,都不会选择其他任何记忆-1策略来入侵和替换此策略。零决定策略和良好的合作策略之间的交集是一组宽松的零决定策略。勒索策略是零决定策略和非稳健背叛策略的交集。针锋相对是合作、背叛和零决定策略的交集。
      第612行: 第612行:  
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector v. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in f, it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where U={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation  <math>\alpha s_x+\beta s_y+\gamma=0</math>.
 
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector v. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in f, it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where U={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation  <math>\alpha s_x+\beta s_y+\gamma=0</math>.
   −
2012年,威廉·H·普莱斯 William H. Press和弗里曼·戴森 Freeman Dyson针对随机迭代囚徒困境提出了一类新的策略,称为“零决定”策略。<ref name="Press2012"/>''X''和''Y''之间的长期收益可以表示为一个矩阵的决定因素,它是两个策略和短期收益向量的函数: <math>s_x=D(P,Q,S_x)</math>和<math>s_y=D(P,Q,S_y)</math>,不涉及平稳向量''v''。 由于行列式函数<math>s_y=D(P,Q,f)</math>在''f''中是线性的,因此可以推出<math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math>(其中''U''={1,1,1,1})。任何策略的<math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math>被定义为零决定策略,长期收益服从关系式<math>\alpha s_x+\beta s_y+\gamma=0</math>。
+
2012年,威廉·H·普莱斯 William H. Press和弗里曼·戴森 Freeman Dyson针对随机重复囚徒困境提出了一类新的策略,称为“零决定”策略。<ref name="Press2012"/>''X''和''Y''之间的长期收益可以表示为一个矩阵的决定因素,它是两个策略和短期收益向量的函数: <math>s_x=D(P,Q,S_x)</math>和<math>s_y=D(P,Q,S_y)</math>,不涉及平稳向量''v''。 由于行列式函数<math>s_y=D(P,Q,f)</math>在''f''中是线性的,因此可以推出<math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math>(其中''U''={1,1,1,1})。任何策略的<math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math>被定义为零决定策略,长期收益服从关系式<math>\alpha s_x+\beta s_y+\gamma=0</math>。
      第620行: 第620行:  
Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math>  to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)
 
Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math>  to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)
   −
针锋相对是一种零决定策略,在不获得超越其他玩家优势的意义下是“公平”的。然而,零决定策略空间还包含这样的策略:在两个玩家的情况下,可以允许一个玩家单方面设置另一个玩家的分数,或者强迫进化的玩家获得比他自己的分数低一些的收益。被勒索的玩家可能会背叛,但会因此获得较低的回报并且受到伤害。因此,勒索的解决方案将迭代囚徒困境转化为一种<font color="#ff8000">最后通牒博弈 ultimatum game </font>。具体来说,''X''能够选择一种策略,对于这种策略,<math>D(P,Q,\beta S_y+\gamma U)=0</math>单方面地将<math>s_y</math>设置为一个特定值范围内的特定值,与''Y''的策略无关,为''X''提供了“勒索”玩家''Y''的机会(反之亦然)。(事实证明,如果''X''试图将<math>s_x</math>设置为一个特定的值,那么可能的范围要小得多,只包括完全合作或完全叛变。<ref name="Press2012"/>)
+
针锋相对是一种零决定策略,在不获得超越其他玩家优势的意义下是“公平”的。然而,零决定策略空间还包含这样的策略:在两个玩家的情况下,可以允许一个玩家单方面设置另一个玩家的分数,或者强迫进化的玩家获得比他自己的分数低一些的收益。被勒索的玩家可能会背叛,但会因此获得较低的回报并且受到伤害。因此,勒索的解决方案将重复囚徒困境转化为一种<font color="#ff8000">最后通牒博弈 ultimatum game </font>。具体来说,''X''能够选择一种策略,对于这种策略,<math>D(P,Q,\beta S_y+\gamma U)=0</math>单方面地将<math>s_y</math>设置为一个特定值范围内的特定值,与''Y''的策略无关,为''X''提供了“勒索”玩家''Y''的机会(反之亦然)。(事实证明,如果''X''试图将<math>s_x</math>设置为一个特定的值,那么可能的范围要小得多,只包括完全合作或完全叛变。<ref name="Press2012"/>)
      第628行: 第628行:  
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).
 
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).
   −
迭代囚徒困境的一个扩展是进化的随机迭代囚徒困境,其中允许特定策略的相对丰度改变,更成功的策略相对增加。这个过程可以通过让不太成功的玩家模仿更成功的策略,或者通过从游戏中淘汰不太成功的玩家,同时让更成功的玩家成倍增加。研究表明,不公平的零决定策略不是进化稳定策略。关键的直觉告诉我们,进化稳定策略不仅要能够入侵另一个群体(这是勒索零决定策略可以做到的) ,而且还要在同类型的其他玩家面前表现良好(勒索零决定策略玩家表现不佳,因为他们减少了彼此的盈余)。<ref>{{cite journal|last=Adami|first=Christoph|author2=Arend Hintze|title=Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything|journal=Nature Communications|volume=4|year=2013|page=3|arxiv=1208.2666|doi=10.1038/ncomms3193|pmid=23903782|pmc=3741637|bibcode=2013NatCo...4.2193A}}</ref>
+
重复囚徒困境的一个扩展是进化的随机重复囚徒困境,其中允许特定策略的相对丰度改变,更成功的策略相对增加。这个过程可以通过让不太成功的玩家模仿更成功的策略,或者通过从游戏中淘汰不太成功的玩家,同时让更成功的玩家成倍增加。研究表明,不公平的零决定策略不是进化稳定策略。关键的直觉告诉我们,进化稳定策略不仅要能够入侵另一个群体(这是勒索零决定策略可以做到的) ,而且还要在同类型的其他玩家面前表现良好(勒索零决定策略玩家表现不佳,因为他们减少了彼此的盈余)。<ref>{{cite journal|last=Adami|first=Christoph|author2=Arend Hintze|title=Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything|journal=Nature Communications|volume=4|year=2013|page=3|arxiv=1208.2666|doi=10.1038/ncomms3193|pmid=23903782|pmc=3741637|bibcode=2013NatCo...4.2193A}}</ref>
      第644行: 第644行:  
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust.  In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.
 
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust.  In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.
   −
虽然勒索零决定策略在人口众多的情况下并不稳定,但另一种宽松的零决定策略既稳定又稳健。事实上,当人口不算太少的时候,这些策略可以取代任何其他零决定策略,甚至在一系列针对迭代囚徒困境的广泛通用策略(包括“获胜-保持-输”的转换策略)中表现良好。亚历山大·斯图尔特 Alexander Stewart和约书亚·普洛特金 Joshua Plotkin在2013年的捐赠博弈中证明了这一点。<ref name=Stewart2013>{{cite journal|last=Stewart|first=Alexander J.|author2=Joshua B. Plotkin|title=From extortion to generosity, evolution in the Iterated Prisoner's Dilemma|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|year=2013|doi=10.1073/pnas.1306246110|pmid=24003115|volume=110|issue=38|pages=15348–53|bibcode=2013PNAS..11015348S|pmc=3780848}}</ref>宽松的策略会与其他合作的玩家合作,面对背叛,慷慨的玩家比他的对手失去更多的效用。宽松策略是零决定策略和所谓的“好”策略的交集,阿金(2013) <ref name=Akin2013>{{cite arxiv|last=Akin|first=Ethan|title=Stable Cooperative Solutions for the Iterated Prisoner's Dilemma|year=2013|page=9|class=math.DS|eprint=1211.0969}} {{bibcode|2012arXiv1211.0969A}}</ref> Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.将这两种策略定义为玩家对过去的相互合作作出回应,并在至少获得合作预期收益的情况下平均分配预期收益的策略。在好的策略中,当总体不太小时,宽松(零决定)子集表现良好。如果总体很少,背叛策略往往占主导地位。<ref name=Stewart2013 />
+
虽然勒索零决定策略在人口众多的情况下并不稳定,但另一种宽松的零决定策略既稳定又稳健。事实上,当人口不算太少的时候,这些策略可以取代任何其他零决定策略,甚至在一系列针对重复囚徒困境的广泛通用策略(包括“获胜-保持-输”的转换策略)中表现良好。亚历山大·斯图尔特 Alexander Stewart和约书亚·普洛特金 Joshua Plotkin在2013年的捐赠博弈中证明了这一点。<ref name=Stewart2013>{{cite journal|last=Stewart|first=Alexander J.|author2=Joshua B. Plotkin|title=From extortion to generosity, evolution in the Iterated Prisoner's Dilemma|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|year=2013|doi=10.1073/pnas.1306246110|pmid=24003115|volume=110|issue=38|pages=15348–53|bibcode=2013PNAS..11015348S|pmc=3780848}}</ref>宽松的策略会与其他合作的玩家合作,面对背叛,慷慨的玩家比他的对手失去更多的效用。宽松策略是零决定策略和所谓的“好”策略的交集,阿金(2013) <ref name=Akin2013>{{cite arxiv|last=Akin|first=Ethan|title=Stable Cooperative Solutions for the Iterated Prisoner's Dilemma|year=2013|page=9|class=math.DS|eprint=1211.0969}} {{bibcode|2012arXiv1211.0969A}}</ref> Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.将这两种策略定义为玩家对过去的相互合作作出回应,并在至少获得合作预期收益的情况下平均分配预期收益的策略。在好的策略中,当总体不太小时,宽松(零决定)子集表现良好。如果总体很少,背叛策略往往占主导地位。<ref name=Stewart2013 />
          
===Continuous iterated prisoner's dilemma===
 
===Continuous iterated prisoner's dilemma===
<font color="#ff8000">连续迭代囚徒困境 Continuous iterated prisoner's dilemma </font>  
+
<font color="#ff8000">连续重复囚徒困境 Continuous iterated prisoner's dilemma </font>  
 
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref> found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from [[Assortative mating|assorting]] with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)
 
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref> found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from [[Assortative mating|assorting]] with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)
    
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)
 
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)
   −
关于迭代囚徒困境的研究大多集中在离散情况下,在这种情况下,参与者要么合作,要么背叛,因为这个模型分析起来比较简单。然而,一些研究人员已经研究了连续迭代囚徒困境模型,在这个模型中,玩家能够对另一个玩家做出可变的贡献。乐 Le和博伊德 Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref>发现,在这种情况下,合作比离散迭代的囚徒困境更难发展。这个结果的基本直觉很简单: 在一个持续的囚徒困境中,如果一个人群开始处于非合作均衡状态,那么与非合作者相比,合作程度稍高的玩家不会从相互配合中获益。相比之下,在离散的囚徒困境中,相对于非合作者,针锋相对的合作者在非合作均衡中相互配合会获得巨大的回报。由于自然界可以提供更多的机会来进行各种各样的合作,而不是严格地将合作或背叛分为两类,因此连续的囚徒困境可以帮助解释为什么现实生活中针锋相对的合作的例子在自然界中极其罕见。(例如,哈默斯坦 Hammerstein <ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)。
+
关于重复囚徒困境的研究大多集中在离散情况下,在这种情况下,参与者要么合作,要么背叛,因为这个模型分析起来比较简单。然而,一些研究人员已经研究了连续重复囚徒困境模型,在这个模型中,玩家能够对另一个玩家做出可变的贡献。乐 Le和博伊德 Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref>发现,在这种情况下,合作比离散重复的囚徒困境更难发展。这个结果的基本直觉很简单: 在一个持续的囚徒困境中,如果一个人群开始处于非合作均衡状态,那么与非合作者相比,合作程度稍高的玩家不会从相互配合中获益。相比之下,在离散的囚徒困境中,相对于非合作者,针锋相对的合作者在非合作均衡中相互配合会获得巨大的回报。由于自然界可以提供更多的机会来进行各种各样的合作,而不是严格地将合作或背叛分为两类,因此连续的囚徒困境可以帮助解释为什么现实生活中针锋相对的合作的例子在自然界中极其罕见。(例如,哈默斯坦 Hammerstein <ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94. </ref>)。
    
even though tit for tat seems robust in theoretical models.
 
even though tit for tat seems robust in theoretical models.
第670行: 第670行:  
Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection.  In this way, iterated rounds facilitate the evolution of stable strategies. Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy&nbsp;– that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.
 
Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection.  In this way, iterated rounds facilitate the evolution of stable strategies. Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy&nbsp;– that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.
   −
玩家似乎不能协调相互合作,因此常常陷入劣等而稳定的背叛策略。这样,迭代回合可以促进稳定策略的发展。<ref>{{cite book|last=Spaniel|first=William|title=Game Theory 101: The Complete Textbook|year=2011}}</ref>迭代回合往往产生新颖的策略,这对复杂的社会互动有影响。其中一个策略就是“赢-保持-输”的转变。这个策略比一个简单的针锋相对策略要好&nbsp;–也就是说,如果你能逃脱作弊的惩罚,就重复这个行为,如果你被抓住了,就改变策略。<ref>{{cite journal|last=Nowak|first=Martin|author2=Karl Sigmund|title=A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game|journal=Nature|year=1993|volume=364|issue=6432|doi=10.1038/364056a0|pages=56–58|pmid=8316296|bibcode=1993Natur.364...56N}}</ref>
+
玩家似乎不能协调相互合作,因此常常陷入劣等而稳定的背叛策略。这样,重复回合可以促进稳定策略的发展。<ref>{{cite book|last=Spaniel|first=William|title=Game Theory 101: The Complete Textbook|year=2011}}</ref>重复回合往往产生新颖的策略,这对复杂的社会互动有影响。其中一个策略就是“赢-保持-输”的转变。这个策略比一个简单的针锋相对策略要好&nbsp;–也就是说,如果你能逃脱作弊的惩罚,就重复这个行为,如果你被抓住了,就改变策略。<ref>{{cite journal|last=Nowak|first=Martin|author2=Karl Sigmund|title=A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game|journal=Nature|year=1993|volume=364|issue=6432|doi=10.1038/364056a0|pages=56–58|pmid=8316296|bibcode=1993Natur.364...56N}}</ref>
      第706行: 第706行:  
An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.
 
An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.
   −
气候变化政治与囚徒困境之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此,政府面临的困境不同于囚徒困境,因为合作的回报是未知的。这种差异表明,各国之间的合作远远少于真正的迭代囚徒困境中的合作,因此避免可能发生的气候灾难的可能性远远小于使用真正的迭代囚徒困境博弈论情景分析<ref>{{cite web|last=Rehmeyer|first=Julie|title=Game theory suggests current climate negotiations won't avert catastrophe|url=https://www.sciencenews.org/article/game-theory-suggests-current-climate-negotiations-won%E2%80%99t-avert-catastrophe|work=Science News|publisher=Society for Science & the Public|date=2012-10-29}}</ref>
+
气候变化政治与囚徒困境之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此,政府面临的困境不同于囚徒困境,因为合作的回报是未知的。这种差异表明,各国之间的合作远远少于真正的重复囚徒困境中的合作,因此避免可能发生的气候灾难的可能性远远小于使用真正的重复囚徒困境博弈论情景分析<ref>{{cite web|last=Rehmeyer|first=Julie|title=Game theory suggests current climate negotiations won't avert catastrophe|url=https://www.sciencenews.org/article/game-theory-suggests-current-climate-negotiations-won%E2%80%99t-avert-catastrophe|work=Science News|publisher=Society for Science & the Public|date=2012-10-29}}</ref>
      第723行: 第723行:  
Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.
 
Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.
   −
许多动物的合作行为可以理解为囚徒困境的一个例子。通常动物会建立长期的伙伴关系,这种关系可以更具体地模拟为迭代囚徒困境。例如,孔雀鱼成群结队地合作监察捕食者,它们被认为是在惩罚不合作的监察者。
+
许多动物的合作行为可以理解为囚徒困境的一个例子。通常动物会建立长期的伙伴关系,这种关系可以更具体地模拟为重复囚徒困境。例如,孔雀鱼成群结队地合作监察捕食者,它们被认为是在惩罚不合作的监察者。
      第950行: 第950行:     
===Iterated snowdrift===
 
===Iterated snowdrift===
<font color="#ff8000">迭代雪堆 Iterated snowdrift </font>
+
<font color="#ff8000">重复雪堆 Iterated snowdrift </font>
 
{{main|snowdrift game}}
 
{{main|snowdrift game}}
   第959行: 第959行:  
Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.
 
Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.
   −
来自洛桑大学和爱丁堡大学的研究人员认为,“迭代雪堆游戏”可能更能反映现实世界的社会状况。虽然这个模型实际上是一个胆小鬼博弈。在这个模型中,由于背叛可以降低被剥削的风险,个体总是从合作选择中获益。这个雪堆游戏可以设想两个司机被困在雪堆的两侧,每个司机都可以选择铲雪清理道路,或者留在自己的车里。一个玩家的最高回报来自于让对手清除所有的积雪,但是仍然可以从对手的工作中得到回报。
+
来自洛桑大学和爱丁堡大学的研究人员认为,“重复雪堆游戏”可能更能反映现实世界的社会状况。虽然这个模型实际上是一个胆小鬼博弈。在这个模型中,由于背叛可以降低被剥削的风险,个体总是从合作选择中获益。这个雪堆游戏可以设想两个司机被困在雪堆的两侧,每个司机都可以选择铲雪清理道路,或者留在自己的车里。一个玩家的最高回报来自于让对手清除所有的积雪,但是仍然可以从对手的工作中得到回报。
      第1,029行: 第1,029行:  
{| class="wikitable" style="text-align: center;"
 
{| class="wikitable" style="text-align: center;"
   −
|+ 迭代雪堆的支出示例 (A, B)
+
|+ 重复雪堆的支出示例 (A, B)
    
! {{diagonal split header|&nbsp;A|B&nbsp;}} !! 合作 !! 背叛
 
! {{diagonal split header|&nbsp;A|B&nbsp;}} !! 合作 !! 背叛
第1,146行: 第1,146行:  
A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.
 
A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.
   −
一个(迭代)囚徒困境博弈的模型是2012年电子游戏《零度逃脱: 美德的最后奖励》的重点,也是2016年续集《零度逃脱: 极限脱出刻之困境》的一个次要部分。
+
一个(重复)囚徒困境博弈的模型是2012年电子游戏《零度逃脱: 美德的最后奖励》的重点,也是2016年续集《零度逃脱: 极限脱出刻之困境》的一个次要部分。
     
99

个编辑

导航菜单