更改

跳到导航 跳到搜索
删除118字节 、 2020年9月25日 (五) 22:13
无编辑摘要
第592行: 第592行:  
The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.
 
The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.
   −
对于一次性 PD 博弈,最优(点数最大化)策略就是简单的背叛; 正如上面解释的那样,无论对手的构成如何,这都是正确的。然而,在迭代 pd 博弈中,最优策略取决于可能的对手的策略,以及他们对背叛和合作的反应。例如,考虑一个群体,其中每个人每次都会变节,除了一个人遵循以牙还牙的策略。那个人由于第一回合的失利而处于轻微的不利地位。在这样一个群体中,个体的最佳策略是每次都叛逃。在一定比例的总是背叛者和其余的是针锋相对的玩家的人群中,个人的最佳策略取决于比例和游戏的长度。
+
对于一次性的囚徒困境博弈,最优(点数最大化)策略就是简单的叛变; 正如上面所说,无论对手的构成如何,这都是正确的。然而,在迭代囚徒困境博弈中,最优策略取决于可能的对手的策略,以及他们对叛变和合作的反应。例如,考虑一个群体,其中每个人每次都会叛变,除了一个人遵循以牙还牙的策略。那个人就会由于第一回合的失利而处于轻微的不利地位。在这样一个群体中,个体的最佳策略是每次都叛变。在一定比例的总是选择背叛的玩家和其余组成为以牙还牙的玩家的人群中,个人的最佳策略取决于这一比例和博弈的次数。
      第600行: 第600行:  
In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn.  In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.
 
In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn.  In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.
   −
在所谓的巴甫洛夫策略中,赢-留,输-转换,面对一个失败的合作,玩家在下一个转换策略。在某些情况下,巴甫洛夫通过给予使用类似策略的合作者优惠待遇打败了所有其他策略。
+
在所谓的巴甫洛夫策略中,赢-保持,输-变换,面对一次合作失败,玩家将在下一次变换策略。在某些情况下,巴甫洛夫通过给予使用类似策略的合作者优惠待遇打败了所有其他策略。
      第620行: 第620行:  
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches – that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
 
Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches – that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.
   −
尽管一报还一报被认为是最有力的基本策略,来自英格兰南安普敦大学的一个团队在20周年的重复囚徒困境竞赛中提出了一个新策略,这个策略被证明比一报还一报更为成功。这种策略依赖于程序之间的合谋,以获得单个程序的最高分数。这所大学向比赛提交了60个项目,这些项目的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。一旦承认,一个程序总是合作,另一个程序总是叛逃,保证叛逃者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛,它会不断地变节,试图最小化竞争程序的得分。因此,2004年囚徒困境锦标赛的结果显示了南安普敦大学在前3名的战略,尽管比 GRIM 战略赢得更少,失去更多。(在 PD 锦标赛中,比赛的目的不是“赢”比赛——频繁叛逃很容易实现)。此外,即使没有软件策略之间的暗中勾结(南安普顿队利用了这一点) ,针锋相对并不总是任何特定锦标赛的绝对赢家; 更准确地说,它在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中,一个给定的策略可以比以牙还牙稍微更好地适应竞争,但是以牙还牙更有力)。这同样适用于针锋相对的宽恕变量,和其他最佳策略: 在任何特定的一天,他们可能不会“赢”对一个特定的混合反战略。另一种方法是使用达尔文的 ESS 模拟。在这样的模拟中,以牙还牙几乎总是占主导地位,尽管讨厌的策略会在人群中进进出出,因为以牙还牙的人群可以通过非报复性的好策略进行渗透,而这反过来又容易成为讨厌策略的牺牲品。理查德 · 道金斯指出,在这里,没有静态的混合策略形成一个稳定的平衡,系统将始终在界限之间振荡这种策略最终在比赛中获得了前三名的位置,以及一些接近垫底的位置。
+
尽管以牙还牙认为是最有力的基本策略,来自英格兰南安普敦大学的一个团队在20周年的迭代囚徒困境竞赛中提出了一个新策略,这个策略被证明比以牙还牙更为成功。这种策略依赖于程序之间的串通,以获得单个程序的最高分数。这所大学向比赛提交了60个程序,这些程序的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。一旦认识建立,一个程序总是合作,另一个程序总是叛变,保证叛变者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛,它会不断地叛变,试图最小化与之竞争程序的得分。因此,2004年囚徒困境锦标赛的结果显示了南安普敦大学战略位居前三名,尽管它比冷酷战略赢得更少,输的更多。(在囚徒困境锦标赛中,比赛的目的不是“赢”比赛——这一点频繁叛变很容易实现)。此外,即使没有软件策略之间的暗中串通(南安普顿队利用了这一点) ,以牙还牙并不总是任何特定锦标赛的绝对赢家; 更准确地说,它是在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中,一个给定的策略可以比以牙还牙稍微更好地适应竞争,但是以牙还牙更有力)。这同样适用于带有宽恕变量的以牙还牙,和其他最佳策略: 在任何特定的一天,他们可能不会“赢”一个特定的混合反战略。另一种方法是使用达尔文的 ESS 模拟。在这样的模拟中,以牙还牙几乎总是占主导地位,尽管讨厌的策略会在人群中进进出出,因为使用以牙还牙策略的人群可以通过非报复性的好策略进行渗透,这反过来使他们容易成为讨厌策略的猎物。理查德·道金斯指出,在这里,没有静态的混合策略会形成一个稳定的平衡,系统将始终在界限之间振荡。这种策略最终在比赛中获得了前三名的位置,以及一些接近垫底的位置。
      第628行: 第628行:  
This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analyzing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Dawkins, in his book The Selfish Gene, pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.
 
This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analyzing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Dawkins, in his book The Selfish Gene, pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.
   −
这种策略利用了这样一个事实,即在这场特殊的比赛中允许多个参赛者,并且一支球队的表现是由得分最高的球员来衡量的(这意味着使用自我牺牲的球员是一种最大化的形式)。在一个只能控制一个玩家的比赛中,以牙还牙当然是一个更好的策略。由于这一新规则的存在,与阿克塞尔罗德的种子竞赛相比,这种竞赛在分析单个智能体策略时也没有什么理论意义。然而,它为分析在多智能体框架下,特别是在存在噪声的情况下,如何实现协作策略提供了基础。事实上,早在这场新规则锦标赛开始之前,道金斯就在他的《自私的基因》一书中指出,如果允许多次参赛,这种策略就有可能获胜,但他说,如果提交的话,阿克塞尔罗德很可能不会允许这种策略。它还依赖于规避囚徒困境的规则,因为两个球员之间不允许交流,南安普顿的项目可以说在开场的“十步舞”中就是这样做的,以认识对方; 这只是强调了交流对于改变游戏的平衡是多么有价值。
+
这种策略利用了这样一个事实,即在这场特殊的比赛中允许多个参赛项目,并且一支队伍的表现是由得分最高的项目来衡量的(这意味着使用自我牺牲的项目是一种分数最大化的形式)。在一个只能控制一个玩家的比赛中,以牙还牙当然是一个更好的策略。由于这一新规则的存在,与阿克塞尔罗德的具有深远影响的竞赛相比,这种竞赛在分析单个智能体策略时也就没有什么理论意义。然而,它为在分析多智能体框架下,特别是在存在干扰的情况下,如何实现协作策略提供了基础。事实上,早在这场新规则锦标赛开始之前,道金斯就在他的《自私的基因》一书中指出,如果允许多次参赛,这种策略就有可能获胜,但他说,如果提交这种策略的话,阿克塞尔罗德很可能不会允许。因为它依赖于规避囚徒困境的规则,即两个球员之间不允许交流,南安普顿的项目可以说在开场的“十步舞”中就是这样做以认识对方的; 这只是强调了交流对于改变游戏的平衡是多么有影响。
      第640行: 第640行:  
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:  <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.
 
In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities:  <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.
   −
在随机迭代囚徒困境博弈中,策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇战中,x 的策略由一组与 y 合作的概率 p 确定,p 是他们之前遭遇的结果的函数,或者是其中的一些子集。如果 p 只是它们最近遇到的 n 的函数,那么它被称为“ memory-n”策略。然后用四个合作概率确定一个 memory-1策略: math p { cc }、 p { cd }、 p { dc }、 p { dd } / math,其中 math p { ab } / math 是 x 在前一次遭遇中合作的概率,而 x 在当前遭遇中合作的概率为拥有属性(ab)。例如,如果前一次遭遇战中 x 合作而 y 叛逃,那么数学 p { cd } / math 就是 x 在当前遭遇战中合作的概率。如果每个概率都是1或0,这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略,写成 p {1,0,1,0} ,其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜--败-转换策略,它被写成 p {1,0,0,1} ,在这种策略中,如果 x 是一个“胜利”(即:。Cc 或 dc) ,但改变策略,如果它是一个损失(即。Cd 或 dd)。研究表明,对于任何一种内存-n 策略,存在一个相应的内存-1策略,这个策略给出相同的统计结果,因此只需要考虑内存-1策略。
+
在随机迭代囚徒困境博弈中,策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇中,x 的策略由一组与 y 合作的概率 p 确定,p 是他们之前遭遇的结果的函数,或者是其中的一些子集。如果 p 只是它们最近遇到次数 n 的函数,那么它被称为“记忆-n”策略。我们可以用四个合作概率确定一个记忆-1策略: math p { cc }、 p { cd }、 p { dc }、 p { dd } / math,其中 math p { ab } / math 是 x 在前一次遭遇中合作的概率,而 x 在当前遭遇中合作的概率为拥有属性(ab)。例如,如果前一次遭遇中 x 合作而 y 叛变,那么数学 p { cd } / math 就是 x 在当前遭遇中合作的概率。如果每个概率都是1或0,这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略,写成 p {1,0,1,0} ,其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略,它被写成 p {1,0,0,1} ,在这种策略中,如果 x 获得胜利(即:cc 或 dc),x会做出与上一次遭遇一样的反应 ,但如果失败,x会改变策略(即cd 或 dd)。研究表明,对于任何一种记忆-n 策略,存在一个相应的记忆-1策略,这个策略给出相同的统计结果,因此只需要考虑记忆-1策略。
      第656行: 第656行:  
One result of stochastic theory is that there exists a stationary vector v for the matrix M such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in <math>M^n</math> will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies P and Q to be compared for their long term payoffs.
 
One result of stochastic theory is that there exists a stationary vector v for the matrix M such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in <math>M^n</math> will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies P and Q to be compared for their long term payoffs.
   −
随机理论的一个结果是,矩阵 m 存在一个平稳向量 v,使得矩阵 m 是一个平稳向量。不失一般性,可以指定 v 是标准化的,因此它的4个组成部分之和是单位。数学 m ^ n / math 中的 ij 项给出了 x 和 y 相遇的结果是 j 的概率,前面的相遇 n 步是 i。当 n 趋于无穷时,m 收敛于一个具有固定值的矩阵,给出了产生 j 的长期概率,j 与 i 无关。换句话说,数学 m ^ infi / math 的行将是相同的,给出了重复囚徒困境的长期平衡结果概率,而不需要明确地计算大量的相互作用。可以看出,v 是数学 m ^ n / math 特别是数学 m ^ infty / math 的平稳向量,因此数学 m ^ infty / math 的每一行都等于 v,因此平稳向量指定 x 的平衡结果概率。将数学 s,r,s,t,p / math 和数学 s y,r,t,s,p / math 定义为{ cc,cd,dc,dd }结果的短期收益向量(从 x 的角度来看) ,x 和 y 的均衡收益现在可以指定为数学 s x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x y y y y y y y y x y y y y y x y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y。
+
随机理论的一个结果是,矩阵 m 存在一个平稳向量 v,使得矩阵 m 是一个平稳向量,并且不失一般性,我们可以指定 v 是标准化的,因此它的4个组成部分之和是单位。数学 m ^ n / math 中的 ij 项给出了 x 和 y 相遇的结果是 j 的概率,前面相遇 n 步的概率是 i。当 n 趋于无穷时,m 收敛于一个具有固定值的矩阵,并给出了产生 j 的长期概率,j 与 i 无关。换句话说,数学 m 的无限次方的行将是相同的,给出了重复囚徒困境的长期平衡结果概率,而不需要明确地计算大量的相互作用。可以看出,v 是数学 m ^ n / math 特别是数学 m ^ infty / math 的平稳向量,因此数学 m的无限次方的每一行都等于 v,因此平稳向量指定 x 的平衡结果概率。将 s,r,s,t,p s y,r,t,s,p 定义为{ cc,cd,dc,dd }结果的短期收益向量(从 x 的角度来看) ,x 和 y 的均衡收益现在可以指定为数学 ,使得两种P、Q策略能比较他们的长期回报。
      第668行: 第668行:  
The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated  prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]
 
The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated  prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]
   −
利用文献[1]中的维恩图,讨论了重复囚徒困境(IPD)中零行列式(ZD)、合作策略和变节策略之间的关系。合作策略总是与其他合作策略相互配合,而变通策略总是与其他变通策略相抵触。这两种策略都包含在强选择下具有鲁棒性的策略子集,这意味着当它们驻留在一个种群中时,没有其他记忆1策略被选择来入侵这样的策略。只有协作策略包含一个总是鲁棒的子集,这意味着在强选择和弱选择情况下,没有选择其他的 memory-1策略来入侵和替换这些策略。Zd 和好的合作策略之间的交集是一套慷慨的 ZD 策略。敲诈策略是 ZD 策略和非鲁棒性叛逃策略的交集。针锋相对是合作、背叛和 ZD 策略的交集。]
+
利用文献[1]中的维恩图,讨论了迭代囚徒困境(IPD)中零行列式(ZD)、合作策略和变节策略之间的关系。合作策略总是与其他合作策略相互配合,而变通策略总是与其他变通策略相抵触。这两种策略都包含在强选择下具有鲁棒性的策略子集,这意味着当它们驻留在一个种群中时,没有其他记忆1策略被选择来入侵这样的策略。只有协作策略包含一个总是鲁棒的子集,这意味着在强选择和弱选择情况下,没有选择其他的记忆-1策略来入侵和替换这些策略。ZD和好的合作策略之间的交集是一套慷慨的 ZD 策略。敲诈策略是 ZD 策略和非鲁棒性叛逃策略的交集。以牙还牙是合作、背叛和 ZD 策略的交集。
      第676行: 第676行:  
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector v. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in f, it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where U={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation  <math>\alpha s_x+\beta s_y+\gamma=0</math>.
 
In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector v. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in f, it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where U={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation  <math>\alpha s_x+\beta s_y+\gamma=0</math>.
   −
2012年,威廉 · h · 普莱斯和弗里曼 · 戴森针对随机重复囚徒困境提出了一类新的策略,称为“零决定因素”策略。X 和 y 之间的长期收益可以表示为一个矩阵的决定因素,它是两个策略和短期收益向量的函数: 不涉及平稳向量 v 的 math s s x d (p,q,sx) / math math s y d (p,q,sy) / math。 由于行列式函数 s y d (p,q,f) / math 在 f 中是线性的,因此可以推出 math  alpha s x + βs y + γd (p,q,αs x + βs y + γu) / math (其中 u {1,1,1})。任何策略的数学 d (p,q, αsx +  βsy +  gamma u)0 / math 被定义为 ZD 策略,长期收益服从关系式。
+
2012年,威廉· h·普莱斯和弗里曼 · 戴森针对随机迭代囚徒困境提出了一类新的策略,称为“零决定因素”策略。x 和 y 之间的长期收益可以表示为一个矩阵的决定因素,它是两个策略和短期收益向量的函数: 不涉及平稳向量 v 的 s s x d (p,q,sx) 和 s y d (p,q,sy)由于行列式函数 s y d (p,q,f) 在 f 中是线性的,因此可以推出alpha s x + βs y + γd (p,q,αs x + βs y + γu)(其中 u {1,1,1})。任何策略的数学 d (p,q, αsx +  βsy +  gamma u)0 被定义为 ZD 策略,长期收益服从关系式。
      第684行: 第684行:  
Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math>  to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)
 
Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math>  to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)
   −
一报还一报是 ZD 战略,这是“公平”的意义上说,没有获得优势的其他球员。然而,ZD 空间也包含一些策略,在两个玩家的情况下,允许一个玩家单方面设置另一个玩家的分数,或者强迫一个进化的玩家获得比他自己的分数低一定百分比的回报。被敲诈的玩家可能会叛逃,但因此获得较低的回报而受到伤害。因此,敲诈的解决方案将重复的囚徒困境转化为一种最后通牒博弈。具体来说,x 能够选择一种策略,对于这种策略,数学 d (p,q, beta sy +  gamma u)0 / math 单方面地将数学 s y / math 设置为一个特定值范围内的特定值,与 y 的策略无关,为 x 提供了一个“勒索”玩家 y 的机会(反之亦然)。(事实证明,如果 x 试图将 math s x / math 设置为一个特定的值,那么可能性的范围要小得多,只包括完全合作或完全背叛。)
+
以牙还牙是 ZD 战略,这是“公平”的意义上说,没有占其他玩家的便宜。然而,ZD 空间也包含一些策略,在两个玩家的情况下,允许一个玩家单方面设置另一个玩家的分数,或者强迫一个进化的玩家获得比他自己的分数低一定百分比的回报。被敲诈的玩家可能会叛变,但因此获得较低的回报而受到伤害。因此,敲诈的解决方案将迭代的囚徒困境转化为一种最后通牒博弈。具体来说,x 能够选择一种策略,对于这种策略,数学 d (p,q, beta sy +  gamma u)0 单方面地将 s y 设置为一个特定值范围内的特定值,与 y 的策略无关,为 x 提供了一个“勒索”玩家 y 的机会(反之亦然)。(事实证明,如果 x 试图将 s x 设置为一个特定的值,那么可能性的范围要小得多,只包括完全合作或完全叛变。)
      第692行: 第692行:  
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).
 
An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).
   −
Ipd 的一个扩展是进化随机 IPD,其中允许特定策略的相对丰度发生变化,更成功的策略相对增加。这个过程可以通过让不那么成功的玩家模仿更成功的策略来完成,或者通过从游戏中淘汰不那么成功的玩家,同时让更成功的玩家成倍增加。研究表明,不公平的 ZD 策略不是进化稳定策略。关键的直觉是,evolutional stable strategy 不仅要能够入侵另一个群体(这是敲诈 ZD 策略可以做到的) ,而且还要在同类型的其他玩家面前表现良好(敲诈 ZD 的玩家表现不佳,因为他们减少了彼此的盈余)。
+
IPD的一个扩展是进化随机 IPD,其中允许特定策略的相对丰度发生变化,更成功的策略相对增加。这个过程可以通过让不那么成功的玩家模仿更成功的策略来完成,或者通过从游戏中淘汰不那么成功的玩家,同时让更成功的玩家成倍增加。研究表明,不公平的 ZD 策略不是进化稳定策略。关键的直觉告诉我们,简化稳定策略不仅要能够入侵另一个群体(这是敲诈 ZD 策略可以做到的) ,而且还要在同类型的其他玩家面前表现良好(敲诈 ZD 的玩家表现不佳,因为他们减少了彼此的盈余)。
      第700行: 第700行:  
Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.
 
Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.
   −
理论和模拟证实,超过一个临界种群规模,ZD 勒索失去了在进化竞争对更多的合作策略,结果,平均回报在种群增加。此外,在有些情况下,勒索者甚至可能通过帮助打破穿制服的叛逃者与赢--输-转换代理人之间的对峙而促进合作。
+
理论和模拟证实,超过一个临界种群规模,ZD 勒索失去了在进化竞争对更多的合作策略,结果,平均回报在种群增加。此外,在有些情况下,勒索者甚至可能通过帮助打破一致叛变者与赢-保持-输-转换代理人之间的对峙而促进合作。
      第708行: 第708行:  
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust.  In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.
 
While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust.  In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.
   −
虽然被敲诈的 ZD 策略在人口众多的情况下并不稳定,但另一种被称为“慷慨”的 ZD 策略既稳定又健壮。事实上,当人口不是太少的时候,这些策略可以取代任何其他 ZD 策略,甚至在一系列针对重复囚徒困境的通用策略中表现良好,包括赢-留,输-转换。亚历山大 · 斯图尔特和约书亚 · 普洛特金在2013年的捐赠游戏中证明了这一点。慷慨的策略会与其他合作的玩家合作,面对叛逃,慷慨的玩家比他的对手失去更多的效用。慷慨策略是 ZD 策略和所谓的“好”策略的交集,Akin (2013)将这两种策略定义为玩家对过去的相互合作作出回应,并在至少获得合作预期收益的情况下平均分配预期收益的策略。在好的策略中,当种群不太小时,慷慨(ZD)子集表现良好。如果人口很少,叛逃策略往往占主导地位。
+
虽然被敲诈的 ZD 策略在人口众多的情况下并不稳定,但另一种被称为“慷慨”的 ZD 策略既稳定又强健。事实上,当人口不是太少的时候,这些策略可以取代任何其他 ZD 策略,甚至在一系列针对迭代囚徒困境的通用策略中表现良好,包括赢-保持,输-转换策略。亚历山大·斯图尔特和约书亚·普洛特金在2013年的捐赠博弈中证明了这一点。慷慨的策略会与其他合作的玩家合作,面对叛变,慷慨的玩家比他的对手失去更多的效用。慷慨策略是 ZD 策略和所谓的“好”策略的交集,Akin (2013)将这两种策略定义为玩家对过去的相互合作作出回应,并在至少获得合作预期收益的情况下平均分配预期收益的策略。在好的策略中,当总体不太小时,慷慨(ZD)子集表现良好。如果总体很少,叛变策略往往占主导地位。
      第718行: 第718行:  
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.
 
Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.
   −
关于重复囚徒困境的研究大多集中在离散情况下,在这种情况下,参与者要么合作,要么缺陷,因为这个模型分析起来比较简单。然而,一些研究人员已经研究了连续迭代囚徒困境的模型,在这个模型中,玩家能够对另一个玩家做出可变的贡献。和 Boyd 发现,在这种情况下,合作比离散迭代的囚徒困境更难发展。这个结果的基本直觉很简单: 在一个持续的囚徒困境中,如果一个人群开始处于非合作均衡状态,那么只比非合作者稍微合作的玩家从相互配合中获益不大。相比之下,在离散的囚徒困境中,针锋相对的合作者相对于非合作者,在非合作均衡中相互配合会获得较大的回报提升。由于自然可以提供更多的机会来进行各种各样的合作,而不是严格地将合作或背叛分为两类,持续的囚徒困境可以帮助解释为什么现实生活中以牙还牙的合作的例子在自然界中极其罕见。哈默斯坦参考汉默斯坦,p。(2003)。为什么互惠在群居动物中如此罕见?新教徒的呼吁。合作的基因和文化进化》 ,麻省理工学院出版社。聚丙烯。83–94.
+
关于迭代囚徒困境的研究大多集中在离散情况下,在这种情况下,参与者要么合作,要么叛变,因为这个模型分析起来比较简单。然而,一些研究人员已经研究了连续迭代囚徒困境的模型,在这个模型中,玩家能够对另一个玩家做出可变的贡献。le和Boyd发现,在这种情况下,合作比离散迭代的囚徒困境更难发展。这个结果的基本直觉很简单: 在一个持续的囚徒困境中,如果一个人群开始处于非合作均衡状态,那么只比非合作者稍微多一点合作性的玩家从相互配合中获益不大。相比之下,在离散的囚徒困境中,以牙还牙的合作者相对于非合作者,在非合作均衡中相互配合会获得较大的回报提升。由于自然可以提供更多的机会来进行各种各样的合作,而不是严格地将合作或叛变分为两类,持续的囚徒困境可以帮助解释为什么现实生活中以牙还牙的合作的例子在自然界中极其罕见。(参考汉默斯坦,p.(2003)。《为什么互惠在群居动物中如此罕见?新教徒的呼吁:合作的基因和文化进化》 ,麻省理工学院出版社。83–94页)。
    
</ref>) even though tit for tat seems robust in theoretical models.
 
</ref>) even though tit for tat seems robust in theoretical models.
第724行: 第724行:  
</ref>) even though tit for tat seems robust in theoretical models.
 
</ref>) even though tit for tat seems robust in theoretical models.
   −
尽管在理论模型中,针锋相对似乎是有力的。
+
尽管在理论模型中,以牙还牙似乎是有力的。
     
153

个编辑

导航菜单