更改

囚徒困境 (查看源代码)

2020年10月11日 (日) 20:13的版本

添加858字节、 2020年10月11日 (日) 20:13

无编辑摘要

第92行：第92行：

The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher while working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and named it "prisoner's dilemma", prensenting it as follows:

−

~~囚徒困境~~是 ~~博弈论~~分析的一个代表性例子，它揭示了为什么两个完全理性的个体可能不会合作，即使这样做似乎对他们最有利。它最初是由梅里尔·弗勒德和梅文·加舍尔于1950年在兰德公司工作时构建的。阿尔伯特.W.塔克将这种博弈以囚徒的方式加以阐述，并将其命名为“~~囚徒困境~~” ，具体阐述如下：

+

囚徒困境prisoner's dilemma是 博弈论game theory分析的一个代表性例子，它揭示了为什么两个完全理性的个体可能不会合作，即使这样做似乎对他们最有利。它最初是由梅里尔·弗勒德和梅文·加舍尔于1950年在兰德公司工作时构建的。阿尔伯特.W.塔克将这种博弈以囚徒的方式加以阐述，并将其命名为“囚徒困境prisoner's dilemma” ，具体阐述如下：

第131行：第131行：

The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activitie

−

~~囚徒困境~~博弈可以作为许多现实世界中涉及合作行为的模型。在非正式用法中，”囚徒困境”一词可适用于不严格符合传统或迭代博弈的正式标准的情况: 例如，两个实体可以从合作中获得巨大利益或者会因为不能合作而遭受损失，但却发现协调其活动很困难或代价昂贵（并非不可能存在这种情况）。

+

囚徒困境prisoner's dilemma博弈可以作为许多现实世界中涉及合作行为的模型。在非正式用法中，”囚徒困境”一词可适用于不严格符合传统或迭代博弈的正式标准的情况: 例如，两个实体可以从合作中获得巨大利益或者会因为不能合作而遭受损失，但却发现协调其活动很困难或代价昂贵（并非不可能存在这种情况）。

==Strategy for the prisoner's dilemma==

−

~~囚徒困境~~的策略

+

囚徒困境prisoner's dilemma的策略

第347行：第347行：

and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:

−

要成为强意义下的 ~~囚徒困境~~博弈，收益必须满足以下条件:

+

要成为强意义下的 囚徒困境prisoner's dilemma博弈，收益必须满足以下条件:

第457行：第457行：

Note that (i.e. ) which qualifies the donation game to be an iterated game (see next section).

−

请注意(即)这使得 ~~捐赠博弈~~成为一个迭代博弈(见下一节)。

+

请注意(即)这使得 捐赠博弈donation game 成为一个迭代博弈(见下一节)。

第464行：第464行：

The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The marginal utility of an apple to the orange-grower X is b, which is higher than the marginal utility (c) of an orange, since X has a surplus of oranges and no apples. Similarly, for apple-grower Y, the marginal utility of an orange is b while the marginal utility of an apple is c. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of b-c. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.

−

~~捐赠博弈~~可能适用于市场。假设 X种橘子，Y 种苹果。苹果对橙子种植者 X 的边际效用是 b，这比橙子的边际效用c高，因为 x 有橙子剩余而没有苹果。同样，对于苹果种植者 y 来说，橙子的边际效用是 b，而苹果的边际效用是 c。如果 X 和Y签约交换一个苹果和一个橙子，并且每个人都完成了交易，那么每个人都会得到从c到b的效用收益。如果一方违约没有按照承诺交货，那么这个违约者将得到 b 效用的收益，而合作者将失去 c的效用收益。如果两者都违约，那么谁也不会得到或失去任何东西。

+

捐赠博弈donation game 可能适用于市场。假设 X种橘子，Y 种苹果。苹果对橙子种植者 X 的边际效用是 b，这比橙子的边际效用c高，因为 x 有橙子剩余而没有苹果。同样，对于苹果种植者 y 来说，橙子的边际效用是 b，而苹果的边际效用是 c。如果 X 和Y签约交换一个苹果和一个橙子，并且每个人都完成了交易，那么每个人都会得到从c到b的效用收益。如果一方违约没有按照承诺交货，那么这个违约者将得到 b 效用的收益，而合作者将失去 c的效用收益。如果两者都违约，那么谁也不会得到或失去任何东西。

==The iterated prisoner's dilemma==

−

迭代 ~~囚徒困境~~{{more citations needed section|date=November 2012}}

+

迭代 囚徒困境prisoner's dilemma{{more citations needed section|date=November 2012}}

If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.

第475行：第475行：

If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.

−

如果两个参与者连续进行多次 ~~囚徒困境~~博弈，他们记住对手先前的行动并相应地改变策略，这种博弈被称为迭代 ~~囚徒困境~~。

+

如果两个参与者连续进行多次 囚徒困境prisoner's dilemma博弈，他们记住对手先前的行动并相应地改变策略，这种博弈被称为迭代 囚徒困境prisoner's dilemma。

第491行：第491行：

The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "peace-war game".

−

迭代 ~~囚徒困境~~博弈是一些人类合作与信任理论的基础。假设博弈可以为两个需要信任的人之间的交易建模，那么群体中的合作行为也可以由多个参与者迭代的博弈模型来建模。因此，这些年来，它吸引了许多学者。1975年，葛夫曼和普尔估计关于它的学术文章超过2000篇。迭代的的 ~~囚徒困境~~也被称为“和平-战争博弈”。

+

迭代 囚徒困境prisoner's dilemma博弈是一些人类合作与信任理论的基础。假设博弈可以为两个需要信任的人之间的交易建模，那么群体中的合作行为也可以由多个参与者迭代的博弈模型来建模。因此，这些年来，它吸引了许多学者。1975年，葛夫曼和普尔估计关于它的学术文章超过2000篇。迭代的的 囚徒困境prisoner's dilemma也被称为“和平-战争博弈”。

第512行：第512行：

For cooperation to emerge between game theoretic rational players, the total number of rounds N must be unknown to the players. In this case "always defect" may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

−

为了使 ~~博弈论~~理性参与者之间出现合作，参与者必须不知道 n 回合的总数。在这种情况下，“总是叛变”可能不再是一个严格占主导地位的策略，而只是一个纳什均衡点。罗伯特·奥曼在1959年的一篇论文中表明，理性参与者在无限多次的博弈中通过反复互动可以维持合作的结果。

+

为了使 博弈论game theory理性参与者之间出现合作，参与者必须不知道 n 回合的总数。在这种情况下，“总是叛变”可能不再是一个严格占主导地位的策略，而只是一个纳什均衡点。罗伯特·奥曼在1959年的一篇论文中表明，理性参与者在无限多次的博弈中通过反复互动可以维持合作的结果。

第520行：第520行：

According to a 2019 experimental study in the American Economic Review which tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always defect, tit-for-tat, and Grim trigger. Which strategy the subjects chose depended on the parameters of the game.

−

《美国经济评论》(American Econo以牙还牙锋相对抑或是 ~~冷酷触发策略~~。受试者选择的策略取决于博弈的参数。

+

《美国经济评论》(American Econo以牙还牙锋相对抑或是 冷酷触发策略Grim trigger。受试者选择的策略取决于博弈的参数。

===Strategy for the iterated prisoner's dilemma===

−

迭代 ~~囚徒困境~~下的策略

+

迭代 囚徒困境prisoner's dilemma下的策略

Interest in the iterated prisoner's dilemma (IPD) was kindled by [[Robert Axelrod]] in his book ''[[The Evolution of Cooperation]]'' (1984). In it he reports on a tournament he organized of the ''N'' step prisoner's dilemma (with ''N'' fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

−

罗伯特·阿克塞尔罗德在他的著作《合作的进化》(1984)中激起了了人们对迭代的 ~~囚徒困境~~(IPD)的兴趣。在这篇文章中，他报道了一个关于 n 次 ~~囚徒困境~~的比赛，参赛者必须一次又一次地选择他们共同的策略，并且要记住他们之前的遭遇。阿克塞尔罗德邀请世界各地的学术界同仁设计计算机策略来参加此次比赛。输入的程序在算法复杂性、最初敌意、宽恕能力等方面差异很大。

+

罗伯特·阿克塞尔罗德在他的著作《合作的进化》(1984)中激起了了人们对迭代的 囚徒困境prisoner's dilemma(IPD)的兴趣。在这篇文章中，他报道了一个关于 n 次 囚徒困境prisoner's dilemma的比赛，参赛者必须一次又一次地选择他们共同的策略，并且要记住他们之前的遭遇。阿克塞尔罗德邀请世界各地的学术界同仁设计计算机策略来参加此次比赛。输入的程序在算法复杂性、最初敌意、宽恕能力等方面差异很大。

第546行：第546行：

The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

−

最终获胜的决定性策略是以牙还牙，这是阿纳托尔·拉波波特开发并参加比赛的策略。这是所有参赛程序中最简单的一个，只有四行 BASIC 语言，并且赢得了比赛。策略很简单，就是在游戏的第一次迭代中进行合作; 在此之后，玩家做他或她的对手在前一步中所做的事情。根据具体情况，一个稍微好一点的策略可以是“~~带着宽恕的心以牙还牙~~”。当对手叛变时，在下一次博弈中，玩家有时还是会合作，但概率很小(大约1-5%)。这允许博弈能偶尔从陷入叛变循环中恢复过来。确切的概率取决于对手的安排。

+

最终获胜的决定性策略是以牙还牙，这是阿纳托尔·拉波波特开发并参加比赛的策略。这是所有参赛程序中最简单的一个，只有四行 BASIC 语言，并且赢得了比赛。策略很简单，就是在游戏的第一次迭代中进行合作; 在此之后，玩家做他或她的对手在前一步中所做的事情。根据具体情况，一个稍微好一点的策略可以是“带着宽恕的心以牙还牙tit for tat with forgiveness”。当对手叛变时，在下一次博弈中，玩家有时还是会合作，但概率很小(大约1-5%)。这允许博弈能偶尔从陷入叛变循环中恢复过来。确切的概率取决于对手的安排。

第562行：第562行：

Nice: The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first.

−

友好：最重要的条件是策略必须是好的，也就是说，它不会在对手之前叛变(这有时被称为“乐观”算法)。几乎所有得分最高的策略都是友好的; 因此，~~一个纯粹自私的策略不会出于纯粹自身利益的原因而“欺骗”对手~~。

+

友好：最重要的条件是策略必须是好的，也就是说，它不会在对手之前叛变(这有时被称为“乐观”算法)。几乎所有得分最高的策略都是友好的; 因此，一个纯粹自私的策略不会出于纯粹自身利益的原因而“欺骗”对手a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first。

; Retaliating: However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.

第588行：第588行：

The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.

−

~~对于一次性的 囚徒困境博弈，最优~~(点数最大化)策略就是简单的叛变; 正如上面所说，无论对手的构成如何，这都是正确的。然而，在迭代 囚徒困境博弈中，最优策略取决于可能的对手的策略，以及他们对叛变和合作的反应。例如，考虑一个群体，其中每个人每次都会叛变，除了一个人遵循以牙还牙的策略。那个人就会由于第一回合的失利而处于轻微的不利地位。在这样一个群体中，个体的最佳策略是每次都叛变。在一定比例的总是选择背叛的玩家和其余组成为以牙还牙的玩家的人群中，个人的最佳策略取决于这一比例和博弈的次数。

+

对于一次性的囚徒困境博弈，最优(点数最大化)策略就是简单的叛变; 正如上面所说，无论对手的构成如何，这都是正确的。然而，在迭代囚徒困境博弈中，最优策略取决于可能的对手的策略，以及他们对叛变和合作的反应。例如，考虑一个群体，其中每个人每次都会叛变，除了一个人遵循以牙还牙的策略。那个人就会由于第一回合的失利而处于轻微的不利地位。在这样一个群体中，个体的最佳策略是每次都叛变。在一定比例的总是选择背叛的玩家和其余组成为以牙还牙的玩家的人群中，个人的最佳策略取决于这一比例和博弈的次数。

第616行：第616行：

Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches – that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

−

尽管以牙还牙认为是最有力的基本策略，来自英格兰南安普敦大学的一个团队在20周年的迭代 ~~囚徒困境~~竞赛中提出了一个新策略，这个策略被证明比以牙还牙更为成功。这种策略依赖于程序之间的串通，以获得单个程序的最高分数。这所大学向比赛提交了60个程序，这些程序的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。一旦认识建立，一个程序总是合作，另一个程序总是叛变，保证叛变者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛，它会不断地叛变，试图最小化与之竞争程序的得分。因此，2004年囚徒困境锦标赛的结果显示了南安普敦大学战略位居前三名，尽管它比冷酷战略赢得更少，输的更多。(在囚徒困境锦标赛中，比赛的目的不是“赢”比赛——这一点频繁叛变很容易实现)。此外，即使没有软件策略之间的暗中串通(南安普顿队利用了这一点) ，以牙还牙并不总是任何特定锦标赛的绝对赢家; 更准确地说，它是在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中，一个给定的策略可以比以牙还牙稍微更好地适应竞争，但是以牙还牙更有力)。这同样适用于带有宽恕变量的以牙还牙，和其他最佳策略: 在任何特定的一天，他们可能不会“赢”一个特定的混合反战略。另一种方法是使用达尔文的 ~~ESS模拟~~。在这样的模拟中，以牙还牙几乎总是占主导地位，尽管讨厌的策略会在人群中进进出出，因为使用以牙还牙策略的人群可以通过非报复性的好策略进行渗透，这反过来使他们容易成为讨厌策略的猎物。理查德·道金斯指出，在这里，没有静态的混合策略会形成一个稳定的平衡，系统将始终在界限之间振荡。这种策略最终在比赛中获得了前三名的位置，以及一些接近垫底的位置。

+

尽管以牙还牙认为是最有力的基本策略，来自英格兰南安普敦大学的一个团队在20周年的迭代 囚徒困境prisoner's dilemma竞赛中提出了一个新策略，这个策略被证明比以牙还牙更为成功。这种策略依赖于程序之间的串通，以获得单个程序的最高分数。这所大学向比赛提交了60个程序，这些程序的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。一旦认识建立，一个程序总是合作，另一个程序总是叛变，保证叛变者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛，它会不断地叛变，试图最小化与之竞争程序的得分。因此，2004年囚徒困境锦标赛的结果显示了南安普敦大学战略位居前三名，尽管它比冷酷战略赢得更少，输的更多。(在囚徒困境锦标赛中，比赛的目的不是“赢”比赛——这一点频繁叛变很容易实现)。此外，即使没有软件策略之间的暗中串通(南安普顿队利用了这一点) ，以牙还牙并不总是任何特定锦标赛的绝对赢家; 更准确地说，它是在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中，一个给定的策略可以比以牙还牙稍微更好地适应竞争，但是以牙还牙更有力)。这同样适用于带有宽恕变量的以牙还牙，和其他最佳策略: 在任何特定的一天，他们可能不会“赢”一个特定的混合反战略。另一种方法是使用达尔文的 ESS模拟ESS simulation。在这样的模拟中，以牙还牙几乎总是占主导地位，尽管讨厌的策略会在人群中进进出出，因为使用以牙还牙策略的人群可以通过非报复性的好策略进行渗透，这反过来使他们容易成为讨厌策略的猎物。理查德·道金斯指出，在这里，没有静态的混合策略会形成一个稳定的平衡，系统将始终在界限之间振荡。这种策略最终在比赛中获得了前三名的位置，以及一些接近垫底的位置。

第629行：第629行：

===Stochastic iterated prisoner's dilemma===

−

+

随机的迭代囚徒困境

第636行：第636行：

In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.

−

在随机迭代 ~~囚徒困境~~博弈中，策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇中，x 的策略由一组与 y 合作的概率 p 确定，p 是他们之前遭遇的结果的函数，或者是其中的一些子集。如果 Pab 只是它们最近遇到次数 n 的函数，那么它被称为“记忆-n”策略。我们可以用四个合作概率确定一个记忆-1策略:p { cc }、 p { cd }、 p { dc }、 p { dd } ，其中 math遭遇中合作的概率。如果每个概率都是1或0，这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略，写成 p {1,0,1,0} ，其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略，它被写成 p {1,0,0,1} ，在这种策略中，如果 x 获得胜利(即:cc 或 dc)，x会做出与上一次遭遇一样的反应，但如果失败，x会改变策略(即cd 或 dd)。研究表明，对于任何一种记忆-n 策略，存在一个相应的记忆-1策略，这个策略给出相同的统计结果，因此只需要考虑记忆-1策略。

+

在随机迭代 囚徒困境prisoner's dilemma博弈中，策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇中，x 的策略由一组与 y 合作的概率 p 确定，p 是他们之前遭遇的结果的函数，或者是其中的一些子集。如果 Pab 只是它们最近遇到次数 n 的函数，那么它被称为“记忆-n”策略。我们可以用四个合作概率确定一个记忆-1策略:p { cc }、 p { cd }、 p { dc }、 p { dd } ，其中 math遭遇中合作的概率。如果每个概率都是1或0，这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略，写成 p {1,0,1,0} ，其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略，它被写成 p {1,0,0,1} ，在这种策略中，如果 x 获得胜利(即:cc 或 dc)，x会做出与上一次遭遇一样的反应，但如果失败，x会改变策略(即cd 或 dd)。研究表明，对于任何一种记忆-n 策略，存在一个相应的记忆-1策略，这个策略给出相同的统计结果，因此只需要考虑记忆-1策略。

第664行：第664行：

The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]

−

利用文献[1]中的 ~~维恩图~~，讨论了迭代 ~~囚徒困境~~(IPD)中零行列式(ZD)、合作策略和变节策略之间的关系。合作策略总是与其他合作策略相互配合，而变通策略总是与其他变通策略相抵触。这两种策略都包含在强选择下具有鲁棒性的策略子集，这意味着当它们驻留在一个种群中时，没有其他记忆-1策略被选择来入侵这样的策略。只有协作策略包含一个总是鲁棒的子集，这意味着在强选择和弱选择情况下，没有选择其他的记忆-1策略来入侵和替换这些策略。ZD和好的合作策略之间的交集是一套慷慨的 ZD 策略。敲诈策略是 ZD 策略和非鲁棒性叛逃策略的交集。以牙还牙是合作、背叛和 ZD 策略的交集。

+

利用文献[1]中的 维恩图Venn diagram，讨论了迭代 囚徒困境prisoner's dilemma(IPD)中零行列式(ZD)、合作策略和变节策略之间的关系。合作策略总是与其他合作策略相互配合，而变通策略总是与其他变通策略相抵触。这两种策略都包含在强选择下具有鲁棒性的策略子集，这意味着当它们驻留在一个种群中时，没有其他记忆-1策略被选择来入侵这样的策略。只有协作策略包含一个总是鲁棒的子集，这意味着在强选择和弱选择情况下，没有选择其他的记忆-1策略来入侵和替换这些策略。ZD和好的合作策略之间的交集是一套慷慨的 ZD 策略。敲诈策略是 ZD 策略和非鲁棒性叛逃策略的交集。以牙还牙是合作、背叛和 ZD 策略的交集。

第688行：第688行：

An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).

−

~~迭代囚徒困境~~的一个扩展是进化的随机迭代囚徒困境，其中允许特定策略的相对丰度发生变化，更成功的策略相对增加。这个过程可以通过让不那么成功的玩家模仿更成功的策略来完成，或者通过从游戏中淘汰不那么成功的玩家，同时让更成功的玩家成倍增加。研究表明，不公平的零决定策略不是进化稳定策略。关键的直觉告诉我们，简化稳定策略不仅要能够入侵另一个群体(这是敲诈零决定策略可以做到的) ，而且还要在同类型的其他玩家面前表现良好(敲诈 ZD 的玩家表现不佳，因为他们减少了彼此的盈余)。

+

迭代囚徒困境IPD的一个扩展是进化的随机迭代囚徒困境，其中允许特定策略的相对丰度发生变化，更成功的策略相对增加。这个过程可以通过让不那么成功的玩家模仿更成功的策略来完成，或者通过从游戏中淘汰不那么成功的玩家，同时让更成功的玩家成倍增加。研究表明，不公平的零决定策略不是进化稳定策略。关键的直觉告诉我们，简化稳定策略不仅要能够入侵另一个群体(这是敲诈零决定策略可以做到的) ，而且还要在同类型的其他玩家面前表现良好(敲诈 ZD 的玩家表现不佳，因为他们减少了彼此的盈余)。

第709行：第709行：

===Continuous iterated prisoner's dilemma===

−

+

连续迭代囚徒困境

Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref> found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from [[Assortative mating|assorting]] with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.

第766行：第766行：

An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.

−

气候变化政治与 ~~囚徒困境~~之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此，政府面临的困境不同于囚徒困境，因为合作的回报是未知的。这种差异表明，各国之间的合作远远少于真正的迭代 囚徒困境/font>中的合作，因此避免可能发生的气候灾难的可能性远远小于使用真正的迭代 ~~囚徒困境~~ ~~博弈论~~情景分析

+

气候变化政治与 囚徒困境prisoner's dilemma 之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此，政府面临的困境不同于囚徒困境，因为合作的回报是未知的。这种差异表明，各国之间的合作远远少于真正的迭代 囚徒困境/font>中的合作，因此避免可能发生的气候灾难的可能性远远小于使用真正的迭代 囚徒困境prisoner's dilemma 博弈论game theory情景分析

第783行：第783行：

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.

−

许多动物的合作行为可以被理解为 ~~囚徒困境~~的一个例子。通常动物会有长期的伙伴关系，这种关系可以更具体地模拟为迭代的囚徒困境。例如，~~孔雀鱼成群结队地合作监察捕食者，它们被认为是在惩罚不合作的监察者~~。

+

许多动物的合作行为可以被理解为 囚徒困境prisoner's dilemma 的一个例子。通常动物会有长期的伙伴关系，这种关系可以更具体地模拟为迭代的囚徒困境。例如，孔雀鱼成群结队地合作监察捕食者，它们被认为是在惩罚不合作的监察者guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors。

第791行：第791行：

Vampire bats are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:

−

吸血蝙蝠是群居动物，从事相互的食物交换。应用的 ~~囚徒困境~~收益可以帮助解释这种行为:

+

吸血蝙蝠是群居动物，从事相互的食物交换。应用的 囚徒困境prisoner's dilemma收益可以帮助解释这种行为:

* C/C: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."

第826行：第826行：

The prisoner's dilemma has been called the E. coli of social psychology, and it has been used widely to research various topics such as oligopolistic competition and collective action to produce a collective good.

−

~~囚徒困境~~被称为社会心理学中的大肠杆菌，它被广泛用于研究寡头垄断竞争及其集体行动来产生集体利益等问题。

+

囚徒困境prisoner's dilemma被称为社会心理学中的大肠杆菌，它被广泛用于研究寡头垄断竞争及其集体行动来产生集体利益等问题。

第852行：第852行：

Doping in sport has been cited as an example of a prisoner's dilemma.

−

体育运动中的兴奋剂被认为是 ~~囚徒困境~~的一个例子。

+

体育运动中的兴奋剂被认为是 囚徒困境prisoner's dilemma的一个例子。

第870行：第870行：

In international political theory, the Prisoner's Dilemma is often used to demonstrate the coherence of strategic realism, which holds that in international relations, all states (regardless of their internal policies or professed ideology), will act in their rational self-interest given international anarchy. A classic example is an arms race like the Cold War and similar conflicts. During the Cold War the opposing alliances of NATO and the Warsaw Pact both had the choice to arm or disarm. From each side's point of view, disarming whilst their opponent continued to arm would have led to military inferiority and possible annihilation. Conversely, arming whilst their opponent disarmed would have led to superiority. If both sides chose to arm, neither could afford to attack the other, but both incurred the high cost of developing and maintaining a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs.

−

在国际政治理论中， ~~囚徒困境~~经常被用来证明战略现实主义的一致性，这种战略现实主义认为，在国际关系中，由于国际无政府状态，所有国家(无论其国内政策或公开宣称的意识形态如何)都会为了自身的理性利益来行动。一个典型的例子是类似冷战和类似冲突的军备竞赛。在冷战期间，北约和华约组织的对立联盟都可以选择武装或解除武装。从双方的观点来看，解除武装而对手继续武装将导致军事劣势和可能的被歼灭。相反，如果武装的时候对手已经解除了武装，那么就会获得优势。如果双方都选择武装自己，那么任何一方都承担不起攻击对方的代价，但是双方都为发展和维持核武库付出了高昂的代价。如果双方都选择裁军，战争就可以避免，也不会有任何代价。

+

在国际政治理论中， 囚徒困境prisoner's dilemma经常被用来证明战略现实主义的一致性，这种战略现实主义认为，在国际关系中，由于国际无政府状态，所有国家(无论其国内政策或公开宣称的意识形态如何)都会为了自身的理性利益来行动。一个典型的例子是类似冷战和类似冲突的军备竞赛。在冷战期间，北约和华约组织的对立联盟都可以选择武装或解除武装。从双方的观点来看，解除武装而对手继续武装将导致军事劣势和可能的被歼灭。相反，如果武装的时候对手已经解除了武装，那么就会获得优势。如果双方都选择武装自己，那么任何一方都承担不起攻击对方的代价，但是双方都为发展和维持核武库付出了高昂的代价。如果双方都选择裁军，战争就可以避免，也不会有任何代价。

第888行：第888行：

Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident the collective cost is no clean plates for anyone.

−

许多现实生活中的困境牵涉到多个参与者。虽然是比喻性的，哈丁的 ~~公地悲剧~~可以被看作是 囚徒困境的多人泛化的一个例子: 每个村民做出选择是为了个人利益还是为了克制。对于一致(甚至频繁)叛变的集体回报是非常低的回报(代表了对“公共资源”的破坏)。一个大多数人都能理解的共同困境就是在一个共享的房子里洗碗。通过不洗碗，个人可以节省时间，但如果这种行为被每个居民采纳，集体成本是任何人都没有干净的盘子。

+

许多现实生活中的困境牵涉到多个参与者。虽然是比喻性的，哈丁的 公地悲剧tragedy of the commons可以被看作是 囚徒困境的多人泛化的一个例子: 每个村民做出选择是为了个人利益还是为了克制。对于一致(甚至频繁)叛变的集体回报是非常低的回报(代表了对“公共资源”的破坏)。一个大多数人都能理解的共同困境就是在一个共享的房子里洗碗。通过不洗碗，个人可以节省时间，但如果这种行为被每个居民采纳，集体成本是任何人都没有干净的盘子。

第896行：第896行：

The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Nobel Memorial Prize in Economic Sciences, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.

−

公共资源并不总是被利用: 威廉·庞德斯通在一本关于 ~~囚徒困境~~的书(见下文参考文献)中描述了新西兰的一种情况，报纸盒子没有上锁。人们可以不付钱就拿报纸(叛变) ，但很少有人这样做，他们觉得如果他们不付钱，那么其他人也不会付钱，这会摧毁整个体系。2009年诺贝尔经济学奖获得者埃莉诺·奥斯特罗姆随后的研究认为公地悲剧经济学过于简单化，其负面结果受到外部影响。在没有复杂压力的情况下，团体之间为了共同利益进行沟通和管理，执行社会规范以保护资源并为团体实现最大利益，这是影响囚徒困境发展最佳结果的一个例子。

+

公共资源并不总是被利用: 威廉·庞德斯通在一本关于 囚徒困境prisoner's dilemma的书(见下文参考文献)中描述了新西兰的一种情况，报纸盒子没有上锁。人们可以不付钱就拿报纸(叛变) ，但很少有人这样做，他们觉得如果他们不付钱，那么其他人也不会付钱，这会摧毁整个体系。2009年诺贝尔经济学奖获得者埃莉诺·奥斯特罗姆随后的研究认为公地悲剧经济学过于简单化，其负面结果受到外部影响。在没有复杂压力的情况下，团体之间为了共同利益进行沟通和管理，执行社会规范以保护资源并为团体实现最大利益，这是影响囚徒困境发展最佳结果的一个例子。

第1,252行：第1,252行：

Several software packages have been created to run prisoner's dilemma simulations and tournaments, some of which have available source code.

−

一些软件包已经被创建来运行 ~~囚徒困境~~模拟和比赛，其中一些有可用的源代码。

+

一些软件包已经被创建来运行 囚徒困境prisoner's dilemma模拟和比赛，其中一些有可用的源代码。

* The source code for the [[The Evolution of Cooperation|second tournament]] run by Robert Axelrod (written by Axelrod and many contributors in [[Fortran]]) is available [http://www-personal.umich.edu/~axe/research/Software/CC/CC2.html online]

第1,270行：第1,270行：

Hannu Rajaniemi set the opening scene of his The Quantum Thief trilogy in a "dilemma prison". The main theme of the series has been described as the "inadequacy of a binary universe" and the ultimate antagonist is a character called the All-Defector. Rajaniemi is particularly interesting as an artist treating this subject in that he is a Cambridge-trained mathematician and holds a PhD in mathematical physics – the interchangeability of matter and information is a major feature of the books, which take place in a "post-singularity" future. The first book in the series was published in 2010, with the two sequels, The Fractal Prince and The Causal Angel, published in 2012 and 2014, respectively.

−

汉努·拉贾尼埃米将他的《量子窃贼》三部曲的开场场景设置在一个“进退两难的监狱”中。该系列的主题被描述为“双重宇宙的不足” ，最终的对手是一个叫做全面叛变者的角色。拉贾尼埃米作为一个处理这个问题的艺术家尤其有趣，因为他是剑桥大学培养的数学家，拥有数学物理学博士学位——物质和信息的可互换性是这本书的一个主要特征，它发生在“ ~~后奇点~~”的未来。该系列的第一本书于2010年出版，其续集《分形王子》和《因果天使》分别于2012年和2014年出版。

+

汉努·拉贾尼埃米将他的《量子窃贼》三部曲的开场场景设置在一个“进退两难的监狱”中。该系列的主题被描述为“双重宇宙的不足” ，最终的对手是一个叫做全面叛变者的角色。拉贾尼埃米作为一个处理这个问题的艺术家尤其有趣，因为他是剑桥大学培养的数学家，拥有数学物理学博士学位——物质和信息的可互换性是这本书的一个主要特征，它发生在“ 后奇点post-singularity”的未来。该系列的第一本书于2010年出版，其续集《分形王子》和《因果天使》分别于2012年和2014年出版。

第1,278行：第1,278行：

A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.

−

一个模仿迭代 ~~囚徒困境~~的游戏《零度逃脱: 美德的最后奖励》是2012年电子游戏的中心焦点，也是2016年续集《零度逃脱: 极限脱出刻之困境》的一个次要部分。

+

一个模仿迭代 囚徒困境prisoner's dilemma 的游戏《零度逃脱: 美德的最后奖励》是2012年电子游戏的中心焦点，也是2016年续集《零度逃脱: 极限脱出刻之困境》的一个次要部分。

第1,294行：第1,294行：

In The Adventure Zone: Balance during The Suffering Game subarc, the player characters are twice presented with the prisoner's dilemma during their time in two liches' domain, once cooperating and once defecting.

−

在冒险区: 苦难游戏的平衡中，玩家角色在他们在两个领域的时间内两次被呈现 ~~囚徒困境~~，一次是合作，一次是叛变。

+

在冒险区: 苦难游戏的平衡中，玩家角色在他们在两个领域的时间内两次被呈现 囚徒困境prisoner's dilemma ，一次是合作，一次是叛变。

Henry

153

个编辑

更改

囚徒困境 (查看源代码)

2020年10月11日 (日) 20:13的版本

导航菜单

搜索