更改

跳到导航 跳到搜索
添加155,159字节 、 2020年5月12日 (二) 17:20
此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。

{{other uses}}

{{short description|Canonical example of a game analyzed in game theory}}

{| class="wikitable floatright"

{| class="wikitable floatright"

{ | class“ wikitable floatright”

|+ Prisoner's dilemma payoff matrix

|+ Prisoner's dilemma payoff matrix

囚徒困境支付矩阵

! {{diagonal split header|A|B}}

!

!

! B stays<br />silent

! B stays<br />silent

!B 保持安静

! B<br />betrays

! B<br />betrays

!背叛

|-

|-

|-

! A stays<br />silent

! A stays<br />silent

!A 保持安静

| {{diagonal split header|-1|-1|transparent}}

|

|

| {{diagonal split header|-3|0|transparent}}

|

|

|-

|-

|-

! A<br />betrays

! A<br />betrays

!一个叛徒

| {{diagonal split header|0|-3|transparent}}

|

|

| {{diagonal split header|-2|-2|transparent}}

|

|

|}

|}

|}

The '''prisoner's dilemma''' is a standard example of a game analyzed in [[game theory]] that shows why two completely [[Rationality#Economics|rational]] individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by [[Merrill Flood]] and [[Melvin Dresher]] while working at [[RAND Corporation|RAND]] in 1950. [[Albert W. Tucker]] formalized the game with prison sentence rewards and named it "prisoner's dilemma",<ref>Poundstone, 1992</ref> presenting it as follows:

The prisoner's dilemma is a standard example of a game analyzed in game theory that shows why two completely rational individuals might not cooperate, even if it appears that it is in their best interests to do so. It was originally framed by Merrill Flood and Melvin Dresher while working at RAND in 1950. Albert W. Tucker formalized the game with prison sentence rewards and named it "prisoner's dilemma", presenting it as follows:

囚徒困境是博弈论分析的一个标准例子,它揭示了为什么两个完全理性的个体可能不会合作,即使这样做似乎对他们最有利。它最初是由 Merrill Flood 和 Melvin Dresher 于1950年在兰德公司工作时构建的。塔克将这个游戏正式化为监狱判决奖励,并将其命名为“囚徒困境” ,提出如下观点:



{{quote|Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The possible outcomes are:

{{quote|Two members of a criminal gang are arrested and imprisoned. Each prisoner is in solitary confinement with no means of communicating with the other. The prosecutors lack sufficient evidence to convict the pair on the principal charge, but they have enough to convict both on a lesser charge. Simultaneously, the prosecutors offer each prisoner a bargain. Each prisoner is given the opportunity either to betray the other by testifying that the other committed the crime, or to cooperate with the other by remaining silent. The possible outcomes are:

{{引号 | 一个犯罪团伙的两名成员被捕入狱。每个囚犯都被关在禁闭室里,没有任何与其他囚犯沟通的方式。检察官缺乏足够的证据来定罪这两人的主要指控,但他们有足够的证据来定罪两个较轻的指控。同时,检察官向每个犯人提供了一个交易。每个囚犯都有机会出卖对方,作证说对方犯了罪,或者与对方合作,保持沉默。可能的结果是:

* If A and B each betray the other, each of them serves two years in prison

* If A betrays B but B remains silent, A will be set free and B will serve three years in prison

* If A remains silent but B betrays A, A will serve three years in prison and B will be set free

* If A and B both remain silent, both of them will serve only one year in prison (on the lesser charge).}}



It is implied that the prisoners will have no opportunity to reward or punish their partner other than the prison sentences they get and that their decision will not affect their reputation in the future. Because betraying a partner offers a greater reward than cooperating with them, all purely rational self-interested prisoners will betray the other, meaning the only possible outcome for two purely rational prisoners is for them to betray each other.<ref>{{cite web|last=Milovsky|first=Nicholas|title=The Basics of Game Theory and Associated Games|url=https://issuu.com/johnsonnick895/docs/game_theory_paper|accessdate=11 February 2014}}</ref> In reality, humans display a [[systemic bias]] towards cooperative behavior in this and similar games despite what is predicted by simple models of "rational" self-interested action.<ref name = Fehr>{{cite journal | last1=Fehr | first1= Ernst | last2=Fischbacher | first2=Urs | date= Oct 23, 2003 | title=The Nature of human altruism |journal=Nature | volume=425 | pages=785–91 | doi=10.1038/nature02043 | url=http://www.iwp.jku.at/born/mpwfst/04/nature02043_f_born.pdf | accessdate=February 27, 2013 | pmid=14574401 | issue=6960|bibcode = 2003Natur.425..785F }}</ref><ref name = Amos>{{cite book | title=Preference, belief, and similarity: selected writings. | publisher=Massachusetts Institute of Technology Press | first1= Amos | last1=Tversky | first2=Eldar | last2=Shafir | url=http://cseweb.ucsd.edu/~gary/PAPER-SUGGESTIONS/Preference,%20Belief,%20and%20Similarity%20Selected%20Writings%20(Bradford%20Books).pdf | year=2004 | isbn=9780262700931 | accessdate=February 27, 2013}}</ref><ref name="Ahn">{{cite journal |last1 = Toh-Kyeong|first1 = Ahn|last2 = Ostrom|first2 = Elinor|last3 = Walker|first3 = James|date = Sep 5, 2002|title = Incorporating Motivational Heterogeneity into Game-Theoretic Models of Collective Action|journal = Public Choice|volume = 117|issue = 3–4|pages = 295–314|doi =10.1023/b:puch.0000003739.54365.fd |url = http://www.indiana.edu/~workshop/seminars/papers/ahnostromwalker_092402.pdf|accessdate = June 27, 2015|hdl = 10535/4697}}</ref><ref name="Hessel">{{cite journal|last1 = Oosterbeek|first1 = Hessel|last2 = Sloof|first2 = Randolph|last3 = Van de Kuilen|first3 = Gus|date = Dec 3, 2003|title = Cultural Differences in Ultimatum Game Experiments: Evidence from a Meta-Analysis|journal = Experimental Economics|volume = 7|issue = 2|pages = 171–88|doi = 10.1023/B:EXEC.0000026978.14316.74|url = http://www.econ.nagoya-cu.ac.jp/~yhamagu/ultimatum.pdf|accessdate = February 27, 2013|url-status = dead|archiveurl = https://web.archive.org/web/20130512175243/http://www.econ.nagoya-cu.ac.jp/~yhamagu/ultimatum.pdf|archivedate = May 12, 2013}}</ref> This bias towards cooperation has been known since the test was first conducted at RAND; the secretaries involved trusted each other and worked together for the best common outcome.<ref>{{Cite book | url=https://books.google.com/?id=WIhZlB86nJwC&pg=PT96&lpg=PT96&dq=rand+secretaries+prisoner%27s+dilemma#v=onepage |title = Why Most Things Fail|isbn = 9780571266142|last1 = Ormerod|first1 = Paul|date = 2010-12-22}}</ref> The prisoner's dilemma became the focus of extensive experimental research.<ref>Deutsch, M. (1958). Trust and suspicion. Journal of Conflict Resolution, 2(4), 265–279. https://doi.org/10.1177/002200275800200401</ref> <ref>Rapoport, A., & Chammah, A. M. (1965). Prisoner’s Dilemma: A study of conflict and cooperation. Ann Arbor, MI: University of Michigan Press.</ref>

It is implied that the prisoners will have no opportunity to reward or punish their partner other than the prison sentences they get and that their decision will not affect their reputation in the future. Because betraying a partner offers a greater reward than cooperating with them, all purely rational self-interested prisoners will betray the other, meaning the only possible outcome for two purely rational prisoners is for them to betray each other. In reality, humans display a systemic bias towards cooperative behavior in this and similar games despite what is predicted by simple models of "rational" self-interested action. This bias towards cooperation has been known since the test was first conducted at RAND; the secretaries involved trusted each other and worked together for the best common outcome. The prisoner's dilemma became the focus of extensive experimental research.

这意味着,除了判处监禁之外,囚犯没有机会奖励或惩罚他们的伴侣,他们的决定将来不会影响他们的声誉。因为背叛一个合作伙伴比与他们合作能得到更大的回报,所有纯粹理性的、自私自利的囚犯都会背叛对方,这意味着,对于两个纯粹理性的囚犯来说,唯一可能的结果就是他们相互背叛。在现实中,人类在这个和类似的博弈中对合作行为表现出一种系统性的偏见,尽管“理性”自利行为的简单模型已经预测到了这一点。自从兰德首次进行这项测试以来,这种对合作的偏见就已经为人所知; 参与测试的秘书们相互信任,为了最佳的共同结果而共同努力。囚徒困境成为广泛的实验研究的焦点。



An extended "iterated" version of the game also exists. In this version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by [[backward induction]]) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.<ref>{{cite journal|url = https://egtheory.wordpress.com/2015/03/02/ipd/|title = Short history of iterated prisoner's dilemma tournaments|date = March 2, 2015|access-date = February 8, 2016|journal = Journal of Conflict Resolution|volume = 24|issue = 3|pages = 379–403|last = Kaznatcheev|first = Artem|doi = 10.1177/002200278002400301}}</ref>

An extended "iterated" version of the game also exists. In this version, the classic game is played repeatedly between the same prisoners, who continuously have the opportunity to penalize the other for previous decisions. If the number of times the game will be played is known to the players, then (by backward induction) two classically rational players will betray each other repeatedly, for the same reasons as the single-shot variant. In an infinite or unknown length game there is no fixed optimum strategy, and prisoner's dilemma tournaments have been held to compete and test algorithms for such cases.

一个扩展的“迭代”版本的游戏也存在。在这个版本中,经典的游戏在同一个囚犯之间重复进行,他们不断有机会因为以前的决定而惩罚另一个。如果玩家知道游戏的次数,那么(到2010年逆向归纳法)两个经典理性的玩家就会因为和单杆变体相同的原因反复背叛对方。在无限长或未知长度的博弈中,没有固定的最优策略,囚徒困境竞赛被用来竞争和检验这种情况下的算法。



The prisoner's dilemma game can be used as a model for many [[#Real-life examples|real world situations]] involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activities.

The prisoner's dilemma game can be used as a model for many real world situations involving cooperative behavior. In casual usage, the label "prisoner's dilemma" may be applied to situations not strictly matching the formal criteria of the classic or iterative games: for instance, those in which two entities could gain important benefits from cooperating or suffer from the failure to do so, but find it difficult or expensive—not necessarily impossible—to coordinate their activities.

囚徒困境博弈可以作为许多现实世界中涉及合作行为的模型。在临时用法中,”囚徒困境”一词可适用于不严格符合传统或迭代游戏的正式标准的情况: 例如,两个实体可以从合作中获得重要利益或因未能合作而遭受损失,但发现协调其活动很困难或代价昂贵ー不一定不可能。



==Strategy for the prisoner's dilemma==



Two prisoners are separated into individual rooms and cannot communicate with each other.

Two prisoners are separated into individual rooms and cannot communicate with each other.

两名囚犯被分开关押在各自的房间里,不能相互交流。

The normal game is shown below:

The normal game is shown below:

正常的游戏如下:



{| class="wikitable"

{| class="wikitable"

{ | class“ wikitable”

|-

|-

|-

! {{diagonal split header|<br />Prisoner A|Prisoner B}} !! Prisoner B stays silent<br>(''cooperates'') !! Prisoner B betrays<br>(''defects'')

! !! Prisoner B stays silent<br>(cooperates) !! Prisoner B betrays<br>(defects)

!!!犯人 b 保持沉默,合作! !犯人 b 出卖了自己的缺点

|-

|-

|-

! Prisoner A stays silent<br>(''cooperates'')

! Prisoner A stays silent<br>(cooperates)

!犯人 a 保持沉默

| Each serves 1 year|| Prisoner A: 3 years<br />Prisoner B: goes free

| Each serves 1 year|| Prisoner A: 3 years<br />Prisoner B: goes free

每人服刑1年囚犯 a: 3年囚犯 b: 无罪释放

|-

|-

|-

! Prisoner A betrays<br>(''defects'')

! Prisoner A betrays<br>(defects)

!犯人 a 出卖了自己的缺点

| Prisoner A: goes free<br />Prisoner B: 3 years || Each serves 2 years

| Prisoner A: goes free<br />Prisoner B: 3 years || Each serves 2 years

囚犯 a: 获释囚犯 b: 3年每人服刑2年

|}

|}

|}



It is assumed that both prisoners understand the nature of the game, have no loyalty to each other, and will have no opportunity for retribution or reward outside the game. Regardless of what the other decides, each prisoner gets a higher reward by betraying the other ("defecting"). The reasoning involves an argument by [[Dilemma#Use in logic|dilemma]]: B will either cooperate or defect. If B cooperates, A should defect, because going free is better than serving 1 year. If B defects, A should also defect, because serving 2 years is better than serving 3. So either way, A should defect. Parallel reasoning will show that B should defect.

It is assumed that both prisoners understand the nature of the game, have no loyalty to each other, and will have no opportunity for retribution or reward outside the game. Regardless of what the other decides, each prisoner gets a higher reward by betraying the other ("defecting"). The reasoning involves an argument by dilemma: B will either cooperate or defect. If B cooperates, A should defect, because going free is better than serving 1 year. If B defects, A should also defect, because serving 2 years is better than serving 3. So either way, A should defect. Parallel reasoning will show that B should defect.

假设两个囚犯都了解游戏的本质,对彼此没有忠诚,在游戏之外没有机会得到报复或奖励。不管对方怎么决定,每个犯人背叛对方都会得到更高的奖励(“叛逃”)。推理涉及一个进退两难的论点: b 要么合作,要么缺陷。如果 b 合作,a 应该叛变,因为免费服刑总比服刑1年好。如果 b 有缺陷,a 也应该有缺陷,因为服刑2年总比服刑3年好。所以不管怎样,a 都应该叛变。并行推理将表明 b 应该缺陷。



Because defection always results in a better payoff than cooperation regardless of the other player's choice, it is a [[dominant strategy]]. Mutual defection is the only strong [[Nash equilibrium]] in the game (i.e. the only outcome from which each player could only do worse by unilaterally changing strategy). The dilemma, then, is that mutual cooperation yields a better outcome than mutual defection but is not the rational outcome because the choice to cooperate, from a self-interested perspective, is irrational.

Because defection always results in a better payoff than cooperation regardless of the other player's choice, it is a dominant strategy. Mutual defection is the only strong Nash equilibrium in the game (i.e. the only outcome from which each player could only do worse by unilaterally changing strategy). The dilemma, then, is that mutual cooperation yields a better outcome than mutual defection but is not the rational outcome because the choice to cooperate, from a self-interested perspective, is irrational.

因为不管对方的选择如何,背叛总是比合作带来更好的回报,所以这是一个占优势的策略。相互背叛是游戏中唯一强大的纳什均衡点。唯一的结果是,每个参与者只能通过单方面改变战略来做得更糟)。因此,困境在于,相互合作比相互背叛产生更好的结果,但不是理性的结果,因为从自我利益的角度来看,合作的选择是非理性的。



==Generalized form==

The structure of the traditional prisoner's dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors red and blue, and that each player chooses to either "cooperate" or "defect".

The structure of the traditional prisoner's dilemma can be generalized from its original prisoner setting. Suppose that the two players are represented by the colors red and blue, and that each player chooses to either "cooperate" or "defect".

传统囚徒困境的结构可以从其最初的囚徒环境中概括出来。假设两个玩家用红色和蓝色表示,并且每个玩家选择“合作”或“背叛”。



If both players cooperate, they both receive the reward ''R'' for cooperating. If both players defect, they both receive the punishment payoff ''P''. If Blue defects while Red cooperates, then Blue receives the temptation payoff ''T'', while Red receives the "sucker's" payoff, ''S''. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff ''S'', while Red receives the temptation payoff ''T''.

If both players cooperate, they both receive the reward R for cooperating. If both players defect, they both receive the punishment payoff P. If Blue defects while Red cooperates, then Blue receives the temptation payoff T, while Red receives the "sucker's" payoff, S. Similarly, if Blue cooperates while Red defects, then Blue receives the sucker's payoff S, while Red receives the temptation payoff T.

如果两个玩家合作,他们都会因为合作而获得奖励。如果两个参与人都叛逃,他们都会得到惩罚回报 p。 如果蓝色缺陷而红色缺陷合作,那么蓝色得到诱惑支付 t,而红色得到“上当受骗者”的支付 s,同样地,如果蓝色合作而红色缺陷,那么蓝色得到上当受骗者的支付 s,而红色得到诱惑支付 t。



This can be expressed in [[Normal-form game|normal form]]:

This can be expressed in normal form:

这可以用标准形式来表示:



{| class="wikitable" style="text-align:center"

{| class="wikitable" style="text-align:center"

{ | 类“ wikitable”样式“ text-align: center”

|+ Canonical PD payoff matrix

|+ Canonical PD payoff matrix

| + 正则 PD 支付矩阵

! {{diagonal split header|{{color|#009|Blue}}|{{color|#900|Red}}}}

! |}}

!|}}

! scope="col" style="width:60px;" | {{color|#900|Cooperate}}

! scope="col" style="width:60px;" |

!范围“ col”样式“ width: 60px; ” |

! scope="col" style="width:60px;" | {{color|#900|Defect}}

! scope="col" style="width:60px;" |

!范围“ col”样式“ width: 60px; ” |

|-

|-

|-

! scope="row" style="width:60px;" | {{color|#009|Cooperate}}

! scope="row" style="width:60px;" |

!作用域“ row”样式“ width: 60px; ” |

| {{diagonal split header|{{color|#009|''R''}}|{{color|#900|''R''}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|''S''}}|{{color|#900|''T''}}|transparent}}

| ||transparent}}

会透明的

|-

|-

|-

! scope="row" | {{color|#009|Defect}}

! scope="row" |

!瞄准镜

| {{diagonal split header|{{color|#009|''T''}}|{{color|#900|''S''}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|''P''}}|{{color|#900|''P''}}|transparent}}

| ||transparent}}

会透明的

|}

|}

|}



and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:

and to be a prisoner's dilemma game in the strong sense, the following condition must hold for the payoffs:

要成为强意义下的囚徒困境博弈,收益必须满足以下条件:



:{{tmath|T > R > P > S}}



The payoff relationship {{tmath|R > P}} implies that mutual cooperation is superior to mutual defection, while the payoff relationships {{tmath|T > R}} and {{tmath|P > S}} imply that defection is the [[dominant strategy]] for both agents.

The payoff relationship implies that mutual cooperation is superior to mutual defection, while the payoff relationships and imply that defection is the dominant strategy for both agents.

回报关系意味着相互合作优于相互背叛,而回报关系意味着背叛是双方的主导策略。



===Special case: donation game===

The "donation game"<ref name=Hilbe2013>{{cite journal|last=Hilbe|first=Christian |author2=Martin A. Nowak |author3=Karl Sigmund|title=Evolution of extortion in Iterated Prisoner's Dilemma games|journal=PNAS|date=April 2013|volume=110|issue=17|pages=6913–18|doi=10.1073/pnas.1214834110|pmid=23572576 |pmc=3637695 |bibcode=2013PNAS..110.6913H |arxiv=1212.1067}}</ref> is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit ''b'' at a personal cost ''c'' with ''b'' > ''c''. Defection means offering nothing. The payoff matrix is thus

The "donation game" is a form of prisoner's dilemma in which cooperation corresponds to offering the other player a benefit b at a personal cost c with b > c. Defection means offering nothing. The payoff matrix is thus

“捐赠博弈”是囚徒困境的一种形式,在这种博弈中,合作相当于以个人成本为另一方提供一个收益,而变节意味着什么也不提供。结果矩阵是这样的



{| class="wikitable" style="text-align:center"

{| class="wikitable" style="text-align:center"

{ | 类“ wikitable”样式“ text-align: center”

! {{diagonal split header|{{navy (color)|Blue}}|{{color|#900|Red}}}}

! |}}

!|}}

! scope="col" style="width:60px;" | {{color|#900|Cooperate}}

! scope="col" style="width:60px;" |

!范围“ col”样式“ width: 60px; ” |

! scope="col" style="width:60px;" | {{color|#900|Defect}}

! scope="col" style="width:60px;" |

!范围“ col”样式“ width: 60px; ” |

|-

|-

|-

! scope="row" style="width:60px;" | {{color|#009|Cooperate}}

! scope="row" style="width:60px;" |

!作用域“ row”样式“ width: 60px; ” |

| {{diagonal split header|{{color|#009|''b''-''c''}}|{{color|#900|''b''-''c''}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|-''c''}}|{{color|#900|''b''}}|transparent}}

| ||transparent}}

会透明的

|-

|-

|-

! scope="row" | {{color|#009|Defect}}

! scope="row" |

!瞄准镜

| {{diagonal split header|{{color|#009|''b''}}|{{color|#900|-''c''}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|0}}|{{color|#900|0}}|transparent}}

| ||transparent}}

会透明的

|}

|}

|}



Note that {{tmath|2R>T+S}} (i.e. {{tmath|2(b-c)>b-c}}) which qualifies the donation game to be an iterated game (see next section).

Note that (i.e. ) which qualifies the donation game to be an iterated game (see next section).

请注意(即)这使得捐赠游戏成为一个迭代游戏(见下一节)。



The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The [[marginal utility]] of an apple to the orange-grower X is ''b'', which is higher than the marginal utility (''c'') of an orange, since X has a surplus of oranges and no apples. Similarly, for apple-grower Y, the marginal utility of an orange is ''b'' while the marginal utility of an apple is ''c''. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of ''b''-''c''. If one "defects" and does not deliver as promised, the defector will receive a payoff of ''b'', while the cooperator will lose ''c''. If both defect, then neither one gains or loses anything.

The donation game may be applied to markets. Suppose X grows oranges, Y grows apples. The marginal utility of an apple to the orange-grower X is b, which is higher than the marginal utility (c) of an orange, since X has a surplus of oranges and no apples. Similarly, for apple-grower Y, the marginal utility of an orange is b while the marginal utility of an apple is c. If X and Y contract to exchange an apple and an orange, and each fulfills their end of the deal, then each receive a payoff of b-c. If one "defects" and does not deliver as promised, the defector will receive a payoff of b, while the cooperator will lose c. If both defect, then neither one gains or loses anything.

捐赠游戏可能适用于市场。假设 x 种橘子,y 种苹果。苹果对橙子种植者 x 的边际效用是 b,这比橙子的边际效用高,因为 x 只有橙子而没有苹果。同样,对于苹果种植者 y 来说,橙子的边际效用是 b,苹果的边际效用是 c。 如果 x 和 y 签约交换一个苹果和一个橙子,并且每个人都完成了交易,那么每个人都会得到 b-c 的回报。如果一个“缺陷”没有按照承诺交货,那么这个“缺陷”将得到 b 的回报,而合作者将失去 c。 如果两者都有缺陷,那么谁也不会得到或失去任何东西。



==The iterated prisoner's dilemma==

{{more citations needed section|date=November 2012}}

If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.

If two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly, the game is called iterated prisoner's dilemma.

如果两个参与者连续进行多次囚徒困境游戏,他们记住对手先前的行动并相应地改变策略,这个游戏被称为迭代囚徒困境。



In addition to the general form above, the iterative version also requires that {{tmath|2R > T + S}}, to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.

In addition to the general form above, the iterative version also requires that , to prevent alternating cooperation and defection giving a greater reward than mutual cooperation.

除了上面的一般形式之外,迭代版本还要求,为了防止交替的合作和背叛给予比相互合作更大的奖励。



The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "[[Peace war game|peace-war game]]".<ref name = Shy>{{cite book | title= Industrial Organization: Theory and Applications | publisher=Massachusetts Institute of Technology Press | first1= Oz | last1=Shy |url=https://books.google.com/?id=tr4CjJ5LlRcC&pg=PR13&dq=industrial+organization+theory+and+applications | year=1995 | isbn=978-0262193665 | accessdate=February 27, 2013}}</ref>

The iterated prisoner's dilemma game is fundamental to some theories of human cooperation and trust. On the assumption that the game can model transactions between two people requiring trust, cooperative behaviour in populations may be modeled by a multi-player, iterated, version of the game. It has, consequently, fascinated many scholars over the years. In 1975, Grofman and Pool estimated the count of scholarly articles devoted to it at over 2,000. The iterated prisoner's dilemma has also been referred to as the "peace-war game".

重复囚徒困境博弈是人类合作与信任理论的基础。假设博弈可以为两个需要信任的人之间的交易建模,群体中的合作行为可以由多个参与者迭代的博弈模型来建模。因此,这些年来,它吸引了许多学者。1975年,葛夫曼和普尔估计关于它的学术文章超过2000篇。重复的囚徒困境也被称为“和平-战争游戏”。



If the game is played exactly ''N'' times and both players know this, then it is optimal to defect in all rounds. The only possible [[Nash equilibrium]] is to always defect. The proof is [[Mathematical induction|inductive]]: one might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

If the game is played exactly N times and both players know this, then it is optimal to defect in all rounds. The only possible Nash equilibrium is to always defect. The proof is inductive: one might as well defect on the last turn, since the opponent will not have a chance to later retaliate. Therefore, both will defect on the last turn. Thus, the player might as well defect on the second-to-last turn, since the opponent will defect on the last no matter what is done, and so on. The same applies if the game length is unknown but has a known upper limit.

如果这个游戏正好玩了 n 次,并且两个玩家都知道这一点,那么在所有回合中最佳的策略就是变节。唯一可能的纳什均衡点就是永远背叛。证据是归纳的: 一个人不妨在最后一回合投降,因为对手以后没有机会反击。因此,双方都会在最后一个回合脱节。因此,玩家不妨在倒数第二回合时变节,因为无论做什么,对手都会在倒数第三回合变节,依此类推。如果游戏的长度是未知的,但有一个已知的上限,这同样适用。



Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The [[superrational]] strategy in the iterated prisoner's dilemma with fixed ''N'' is to cooperate against a superrational opponent, and in the limit of large ''N'', experimental results on strategies agree with the superrational version, not the game-theoretic rational one.

Unlike the standard prisoner's dilemma, in the iterated prisoner's dilemma the defection strategy is counter-intuitive and fails badly to predict the behavior of human players. Within standard economic theory, though, this is the only correct answer. The superrational strategy in the iterated prisoner's dilemma with fixed N is to cooperate against a superrational opponent, and in the limit of large N, experimental results on strategies agree with the superrational version, not the game-theoretic rational one.

与标准的囚徒困境不同,在重复的囚徒困境中,叛逃策略是违反直觉的,严重不能预测人类玩家的行为。然而,在标准的经济理论中,这是唯一正确的答案。具有固定 n 的重复囚徒困境中的超理性策略是对超理性对手的合作,在大 n 的限制下,实验结果与超理性的结果一致,而不是博弈论的理性结果。



For [[cooperation]] to emerge between game theoretic rational players, the total number of rounds ''N'' must be unknown to the players. In this case "always defect" may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by [[Robert Aumann]] in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

For cooperation to emerge between game theoretic rational players, the total number of rounds N must be unknown to the players. In this case "always defect" may no longer be a strictly dominant strategy, only a Nash equilibrium. Amongst results shown by Robert Aumann in a 1959 paper, rational players repeatedly interacting for indefinitely long games can sustain the cooperative outcome.

为了使博弈论理性参与者之间出现合作,参与者必须不知道 n 回合的总数。在这种情况下,“总是缺陷”可能不再是一个严格占主导地位的策略,而只是一个纳什均衡点。罗伯特 · 奥曼在1959年的一篇论文中表明,理性参与者在无限长的博弈中反复互动可以维持合作的结果。



According to a 2019 experimental study in the ''American Economic Review'' which tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always defect, [[Tit for tat|tit-for-tat]], and [[Grim trigger]]. Which strategy the subjects chose depended on the parameters of the game.<ref>{{Cite journal|last=Dal Bó|first=Pedro|last2=Fréchette|first2=Guillaume R.|date=2019|title=Strategy Choice in the Infinitely Repeated Prisoner's Dilemma|journal=American Economic Review|language=en|volume=109|issue=11|pages=3929–3952|doi=10.1257/aer.20181480|issn=0002-8282}}</ref>

According to a 2019 experimental study in the American Economic Review which tested what strategies real-life subjects used in iterated prisoners' dilemma situations with perfect monitoring, the majority of chosen strategies were always defect, tit-for-tat, and Grim trigger. Which strategy the subjects chose depended on the parameters of the game.

美国经济评论》(American Economic Review)2019年的一项实验研究测试了现实生活中的实验对象在完全监控的情况下在反复出现的囚徒困境中使用的策略,结果显示,大多数选择的策略都是缺陷、针锋相对和残酷的触发。受试者选择的策略取决于游戏的参数。



===Strategy for the iterated prisoner's dilemma===

Interest in the iterated prisoner's dilemma (IPD) was kindled by [[Robert Axelrod]] in his book ''[[The Evolution of Cooperation]]'' (1984). In it he reports on a tournament he organized of the ''N'' step prisoner's dilemma (with ''N'' fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

Interest in the iterated prisoner's dilemma (IPD) was kindled by Robert Axelrod in his book The Evolution of Cooperation (1984). In it he reports on a tournament he organized of the N step prisoner's dilemma (with N fixed) in which participants have to choose their mutual strategy again and again, and have memory of their previous encounters. Axelrod invited academic colleagues all over the world to devise computer strategies to compete in an IPD tournament. The programs that were entered varied widely in algorithmic complexity, initial hostility, capacity for forgiveness, and so forth.

罗伯特 · 阿克塞尔罗德在他的著作《合作的进化》(1984)中点燃了人们对反复出现的囚徒困境(IPD)的兴趣。在这篇文章中,他报道了一个关于 n 步囚徒困境的锦标赛,参赛者必须一次又一次地选择他们共同的策略,并且要记住他们之前的遭遇。阿克塞尔罗德邀请世界各地的学术界同仁设计计算机策略,以参加知识产权促进发展锦标赛。输入的程序在算法复杂性、最初的敌意、宽恕能力等方面差异很大。



Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more [[altruism|altruistic]] strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by [[natural selection]].

Axelrod discovered that when these encounters were repeated over a long period of time with many players, each with different strategies, greedy strategies tended to do very poorly in the long run while more altruistic strategies did better, as judged purely by self-interest. He used this to show a possible mechanism for the evolution of altruistic behaviour from mechanisms that are initially purely selfish, by natural selection.

阿克塞尔罗德发现,当这些遭遇在很长一段时间内与许多玩家重复发生时,每个玩家都有不同的策略,从长远来看,贪婪策略往往表现得非常糟糕,而更多的利他策略表现得更好,这完全是根据自身利益来判断的。他利用这一结果,通过自然选择,从最初纯粹自私的机制中,揭示了利他行为进化的可能机制。



The winning [[deterministic algorithm|deterministic]] strategy was tit for tat, which [[Anatol Rapoport]] developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of [[BASIC]], and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

The winning deterministic strategy was tit for tat, which Anatol Rapoport developed and entered into the tournament. It was the simplest of any program entered, containing only four lines of BASIC, and won the contest. The strategy is simply to cooperate on the first iteration of the game; after that, the player does what his or her opponent did on the previous move. Depending on the situation, a slightly better strategy can be "tit for tat with forgiveness". When the opponent defects, on the next move, the player sometimes cooperates anyway, with a small probability (around 1–5%). This allows for occasional recovery from getting trapped in a cycle of defections. The exact probability depends on the line-up of opponents.

决定性的胜利策略是以牙还牙,这是阿纳托尔 · 拉波波特开发并参加锦标赛的策略。这是所有参赛程序中最简单的一个,只有四行 BASIC 语言,并且赢得了比赛。策略很简单,就是在游戏的第一次迭代中进行合作; 在此之后,玩家做他或她的对手在前一步中所做的事情。根据具体情况,一个稍微好一点的策略可以是“以牙还牙,以牙还牙”。当对手出现缺陷时,在下一步棋中,玩家有时还是会合作,概率很小(大约1-5%)。这允许偶尔从陷入叛逃循环中恢复过来。确切的概率取决于对手的阵容。



By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

By analysing the top-scoring strategies, Axelrod stated several conditions necessary for a strategy to be successful.

通过分析得分最高的战略,阿克塞尔罗德阐述了战略成功的几个必要条件。



; Nice: The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first.

Nice: The most important condition is that the strategy must be "nice", that is, it will not defect before its opponent does (this is sometimes referred to as an "optimistic" algorithm). Almost all of the top-scoring strategies were nice; therefore, a purely selfish strategy will not "cheat" on its opponent, for purely self-interested reasons first.

Nice: 最重要的条件是策略必须是“ Nice” ,也就是说,它不会在对手之前叛变(这有时被称为“ optimistic”算法)。几乎所有得分最高的策略都很好; 因此,一个纯粹自私的策略不会出于纯粹自身利益的原因而“欺骗”对手。

; Retaliating: However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.

Retaliating: However, Axelrod contended, the successful strategy must not be a blind optimist. It must sometimes retaliate. An example of a non-retaliating strategy is Always Cooperate. This is a very bad choice, as "nasty" strategies will ruthlessly exploit such players.

报复: 然而,阿克塞尔罗德认为,成功的战略决不能是盲目的乐观主义。它有时必须进行报复。非报复策略的一个例子就是永远合作。这是一个非常糟糕的选择,因为“肮脏”的策略会无情地剥削这些玩家。

; Forgiving: Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points.

Forgiving: Successful strategies must also be forgiving. Though players will retaliate, they will once again fall back to cooperating if the opponent does not continue to defect. This stops long runs of revenge and counter-revenge, maximizing points.

宽容: 成功的策略也必须是宽容的。虽然玩家会报复,但如果对手不继续叛变,他们将再次回到合作的状态。这阻止了长时间的报复和反报复,最大限度地提高积分。

; Non-envious: The last quality is being non-envious, that is not striving to score more than the opponent.

Non-envious: The last quality is being non-envious, that is not striving to score more than the opponent.

不嫉妒: 最后一个品质是不嫉妒,不比对手得分更多。



The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.

The optimal (points-maximizing) strategy for the one-time PD game is simply defection; as explained above, this is true whatever the composition of opponents may be. However, in the iterated-PD game the optimal strategy depends upon the strategies of likely opponents, and how they will react to defections and cooperations. For example, consider a population where everyone defects every time, except for a single individual following the tit for tat strategy. That individual is at a slight disadvantage because of the loss on the first turn. In such a population, the optimal strategy for that individual is to defect every time. In a population with a certain percentage of always-defectors and the rest being tit for tat players, the optimal strategy for an individual depends on the percentage, and on the length of the game.

对于一次性 PD 博弈,最优(点数最大化)策略就是简单的背叛; 正如上面解释的那样,无论对手的构成如何,这都是正确的。然而,在迭代 pd 博弈中,最优策略取决于可能的对手的策略,以及他们对背叛和合作的反应。例如,考虑一个群体,其中每个人每次都会变节,除了一个人遵循以牙还牙的策略。那个人由于第一回合的失利而处于轻微的不利地位。在这样一个群体中,个体的最佳策略是每次都叛逃。在一定比例的总是背叛者和其余的是针锋相对的玩家的人群中,个人的最佳策略取决于比例和游戏的长度。



In the strategy called Pavlov, [[win-stay, lose-switch]], faced with a failure to cooperate, the player switches strategy the next turn.<ref>http://www.pnas.org/content/pnas/93/7/2686.full.pdf</ref> In certain circumstances,{{specify|date=November 2012}} Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.

In the strategy called Pavlov, win-stay, lose-switch, faced with a failure to cooperate, the player switches strategy the next turn. In certain circumstances, Pavlov beats all other strategies by giving preferential treatment to co-players using a similar strategy.

在所谓的巴甫洛夫策略中,赢-留,输-转换,面对一个失败的合作,玩家在下一个转换策略。在某些情况下,巴甫洛夫通过给予使用类似策略的合作者优惠待遇打败了所有其他策略。



Deriving the optimal strategy is generally done in two ways:

Deriving the optimal strategy is generally done in two ways:

得出最佳策略通常有两种方法:

* [[Bayesian Nash equilibrium]]: If the statistical distribution of opposing strategies can be determined (e.g. 50% tit for tat, 50% always cooperate) an optimal counter-strategy can be derived analytically.{{efn|1=For example see the 2003 study<ref>{{cite web|url= http://econ.hevra.haifa.ac.il/~mbengad/seminars/whole1.pdf|title=Bayesian Nash equilibrium; a statistical test of the hypothesis|url-status=dead|archive-url= https://web.archive.org/web/20051002195142/http://econ.hevra.haifa.ac.il/~mbengad/seminars/whole1.pdf|archive-date=2005-10-02|publisher=[[Tel Aviv University]]}}</ref> for discussion of the concept and whether it can apply in real [[economic]] or strategic situations.}}

* [[Monte Carlo method|Monte Carlo]] simulations of populations have been made, where individuals with low scores die off, and those with high scores reproduce (a [[genetic algorithm]] for finding an optimal strategy). The mix of algorithms in the final population generally depends on the mix in the initial population. The introduction of mutation (random variation during reproduction) lessens the dependency on the initial population; empirical experiments with such systems tend to produce tit for tat players (see for instance Chess 1988),{{Clarify|date=August 2016}} but no analytic proof exists that this will always occur.<ref>{{Citation|last=Wu|first=Jiadong|title=Cooperation on the Monte Carlo Rule: Prisoner's Dilemma Game on the Grid|date=2019|work=Theoretical Computer Science|volume=1069|pages=3–15|editor-last=Sun|editor-first=Xiaoming|publisher=Springer Singapore|language=en|doi=10.1007/978-981-15-0105-0_1|isbn=978-981-15-0104-3|last2=Zhao|first2=Chengye|editor2-last=He|editor2-first=Kun|editor3-last=Chen|editor3-first=Xiaoyun}}</ref>



Although tit for tat is considered to be the most [[robust]] basic strategy, a team from [[Southampton University]] in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start.<ref>{{cite press release|url= http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|publisher=University of Southampton|title=University of Southampton team wins Prisoner's Dilemma competition|date=7 October 2004|url-status=dead|archive-url= https://web.archive.org/web/20140421055745/http://www.southampton.ac.uk/mediacentre/news/2004/oct/04_151.shtml|archive-date=2014-04-21}}</ref> Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show [[University of Southampton]]'s strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches&nbsp;– that can easily be achieved by frequent defection). Also, even without implicit collusion between [[computer program|software strategies]] (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian [[Evolutionarily stable strategy|ESS]] simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. [[Richard Dawkins]] showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England introduced a new strategy at the 20th-anniversary iterated prisoner's dilemma competition, which proved to be more successful than tit for tat. This strategy relied on collusion between programs to achieve the highest number of points for a single program. The university submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result, the 2004 Prisoners' Dilemma Tournament results show University of Southampton's strategies in the first three places, despite having fewer wins and many more losses than the GRIM strategy. (In a PD tournament, the aim of the game is not to "win" matches&nbsp;– that can easily be achieved by frequent defection). Also, even without implicit collusion between software strategies (exploited by the Southampton team) tit for tat is not always the absolute winner of any given tournament; it would be more precise to say that its long run results over a series of tournaments outperform its rivals. (In any one event a given strategy can be slightly better adjusted to the competition than tit for tat, but tit for tat is more robust). The same applies for the tit for tat with forgiveness variant, and other optimal strategies: on any given day they might not "win" against a specific mix of counter-strategies. An alternative way of putting it is using the Darwinian ESS simulation. In such a simulation, tit for tat will almost always come to dominate, though nasty strategies will drift in and out of the population because a tit for tat population is penetrable by non-retaliating nice strategies, which in turn are easy prey for the nasty strategies. Richard Dawkins showed that here, no static mix of strategies form a stable equilibrium and the system will always oscillate between bounds.}} this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

尽管一报还一报被认为是最有力的基本策略,来自英格兰南安普敦大学的一个团队在20周年的重复囚徒困境竞赛中提出了一个新策略,这个策略被证明比一报还一报更为成功。这种策略依赖于程序之间的合谋,以获得单个程序的最高分数。这所大学向比赛提交了60个项目,这些项目的设计目的是在比赛开始时通过一系列的5到10个动作来互相认识。一旦承认,一个程序总是合作,另一个程序总是叛逃,保证叛逃者得到最多的分数。如果这个程序意识到它正在和一个非南安普顿的球员比赛,它会不断地变节,试图最小化竞争程序的得分。因此,2004年囚徒困境锦标赛的结果显示了南安普敦大学在前3名的战略,尽管比 GRIM 战略赢得更少,失去更多。(在 PD 锦标赛中,比赛的目的不是“赢”比赛——频繁叛逃很容易实现)。此外,即使没有软件策略之间的暗中勾结(南安普顿队利用了这一点) ,针锋相对并不总是任何特定锦标赛的绝对赢家; 更准确地说,它在一系列锦标赛中的长期结果超过了它的竞争对手。(在任何一个事件中,一个给定的策略可以比以牙还牙稍微更好地适应竞争,但是以牙还牙更有力)。这同样适用于针锋相对的宽恕变量,和其他最佳策略: 在任何特定的一天,他们可能不会“赢”对一个特定的混合反战略。另一种方法是使用达尔文的 ESS 模拟。在这样的模拟中,以牙还牙几乎总是占主导地位,尽管讨厌的策略会在人群中进进出出,因为以牙还牙的人群可以通过非报复性的好策略进行渗透,而这反过来又容易成为讨厌策略的牺牲品。理查德 · 道金斯指出,在这里,没有静态的混合策略形成一个稳定的平衡,系统将始终在界限之间振荡这种策略最终在比赛中获得了前三名的位置,以及一些接近垫底的位置。



This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of [[minmaxing]]). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analyzing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Dawkins, in his book ''[[The Selfish Gene]]'', pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.

This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analyzing single agent strategies as compared to Axelrod's seminal tournament. However, it provided a basis for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Dawkins, in his book The Selfish Gene, pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoner's dilemma in that there is no communication allowed between the two players, which the Southampton programs arguably did with their opening "ten move dance" to recognize one another; this only reinforces just how valuable communication can be in shifting the balance of the game.

这种策略利用了这样一个事实,即在这场特殊的比赛中允许多个参赛者,并且一支球队的表现是由得分最高的球员来衡量的(这意味着使用自我牺牲的球员是一种最大化的形式)。在一个只能控制一个玩家的比赛中,以牙还牙当然是一个更好的策略。由于这一新规则的存在,与阿克塞尔罗德的种子竞赛相比,这种竞赛在分析单个智能体策略时也没有什么理论意义。然而,它为分析在多智能体框架下,特别是在存在噪声的情况下,如何实现协作策略提供了基础。事实上,早在这场新规则锦标赛开始之前,道金斯就在他的《自私的基因》一书中指出,如果允许多次参赛,这种策略就有可能获胜,但他说,如果提交的话,阿克塞尔罗德很可能不会允许这种策略。它还依赖于规避囚徒困境的规则,因为两个球员之间不允许交流,南安普顿的项目可以说在开场的“十步舞”中就是这样做的,以认识对方; 这只是强调了交流对于改变游戏的平衡是多么有价值。



===Stochastic iterated prisoner's dilemma===



In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities".<ref name=Press2012>{{cite journal|last1=Press|first1=WH|last2=Dyson|first2=FJ|title=Iterated Prisoner's Dilemma contains strategies that dominate any evolutionary opponent|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|date=26 June 2012|volume=109|issue=26|pages=10409–13|doi=10.1073/pnas.1206569109|pmid=22615375|pmc=3387070|bibcode=2012PNAS..10910409P}}</ref> In an encounter between player ''X'' and player ''Y'', ''X'' 's strategy is specified by a set of probabilities ''P'' of cooperating with ''Y''. ''P'' is a function of the outcomes of their previous encounters or some subset thereof. If ''P'' is a function of only their most recent ''n'' encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that ''X'' will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which ''X'' cooperated and ''Y'' defected, then <math>P_{cd}</math> is the probability that ''X'' will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as ''P''={1,0,1,0}, in which ''X'' responds as ''Y'' did in the previous encounter. Another is the [[win–stay, lose–switch]] strategy written as ''P''={1,0,0,1}, in which ''X'' responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.<ref name="Press2012"/>

In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.

在随机迭代囚徒困境博弈中,策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇战中,x 的策略由一组与 y 合作的概率 p 确定,p 是他们之前遭遇的结果的函数,或者是其中的一些子集。如果 p 只是它们最近遇到的 n 的函数,那么它被称为“ memory-n”策略。然后用四个合作概率确定一个 memory-1策略: math p { cc }、 p { cd }、 p { dc }、 p { dd } / math,其中 math p { ab } / math 是 x 在前一次遭遇中合作的概率,而 x 在当前遭遇中合作的概率为拥有属性(ab)。例如,如果前一次遭遇战中 x 合作而 y 叛逃,那么数学 p { cd } / math 就是 x 在当前遭遇战中合作的概率。如果每个概率都是1或0,这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略,写成 p {1,0,1,0} ,其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-留-败-转换策略,它被写成 p {1,0,0,1} ,在这种策略中,如果 x 是一个“胜利”(即:。Cc 或 dc) ,但改变策略,如果它是一个损失(即。Cd 或 dd)。研究表明,对于任何一种内存-n 策略,存在一个相应的内存-1策略,这个策略给出相同的统计结果,因此只需要考虑内存-1策略。



If we define ''P'' as the above 4-element strategy vector of ''X'' and <math>Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}</math> as the 4-element strategy vector of ''Y'', a transition matrix ''M'' may be defined for ''X'' whose ''ij'' th entry is the probability that the outcome of a particular encounter between ''X'' and ''Y'' will be ''j'' given that the previous encounter was ''i'', where ''i'' and ''j'' are one of the four outcome indices: ''cc'', ''cd'', ''dc'', or ''dd''. For example, from ''X'' 's point of view, the probability that the outcome of the present encounter is ''cd'' given that the previous encounter was ''cd'' is equal to <math>M_{cd,cd}=P_{cd}(1-Q_{dc})</math>. (The indices for ''Q'' are from ''Y'' 's point of view: a ''cd'' outcome for ''X'' is a ''dc'' outcome for ''Y''.) Under these definitions, the iterated prisoner's dilemma qualifies as a [[stochastic process]] and ''M'' is a [[stochastic matrix]], allowing all of the theory of stochastic processes to be applied.<ref name="Press2012"/>

If we define P as the above 4-element strategy vector of X and <math>Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}</math> as the 4-element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X 's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to <math>M_{cd,cd}=P_{cd}(1-Q_{dc})</math>. (The indices for Q are from Y 's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.

如果我们将 p 定义为 x 的上述4元策略向量,并将 math q { cc }、 q { cd }、 q { dc }、 q { dd } / math 定义为 y 的4元策略向量,则对于 x 可以定义一个转移矩阵 m,该 x 的第 j 项是 x 和 y 之间特定遭遇的结果为 j 的概率,前一次遭遇为 i,其中 i 和 j 是 cc、 cd、 dc 或 dd 四个结果索引中的一个。例如,从 x 的角度来看,如果前一次遭遇的结果是 cd,那么这次遭遇的结果是 cd 的概率等于 m { cd,cd } p { cd }(1-Q { dc }) / math。(q 的指数是 y 的观点: x 的 cd 结果是 y 的 dc 结果)在这些定义下,重复的囚徒困境被定义为一个随机过程,m 是一个转移矩阵,允许所有的随机过程理论被应用。



One result of stochastic theory is that there exists a stationary vector ''v'' for the matrix ''M'' such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that ''v'' is normalized so that the sum of its four components is unity. The ''ij'' th entry in <math>M^n</math> will give the probability that the outcome of an encounter between ''X'' and ''Y'' will be ''j'' given that the encounter ''n'' steps previous is ''i''. In the limit as ''n'' approaches infinity, ''M'' will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing ''j'' which will be independent of ''i''. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that ''v'' is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to ''v''. Thus the stationary vector specifies the equilibrium outcome probabilities for ''X''. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From ''X'' 's point of view), the equilibrium payoffs for ''X'' and ''Y'' can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies ''P'' and ''Q'' to be compared for their long term payoffs.

One result of stochastic theory is that there exists a stationary vector v for the matrix M such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in <math>M^n</math> will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies P and Q to be compared for their long term payoffs.

随机理论的一个结果是,矩阵 m 存在一个平稳向量 v,使得矩阵 m 是一个平稳向量。不失一般性,可以指定 v 是标准化的,因此它的4个组成部分之和是单位。数学 m ^ n / math 中的 ij 项给出了 x 和 y 相遇的结果是 j 的概率,前面的相遇 n 步是 i。当 n 趋于无穷时,m 收敛于一个具有固定值的矩阵,给出了产生 j 的长期概率,j 与 i 无关。换句话说,数学 m ^ infi / math 的行将是相同的,给出了重复囚徒困境的长期平衡结果概率,而不需要明确地计算大量的相互作用。可以看出,v 是数学 m ^ n / math 特别是数学 m ^ infty / math 的平稳向量,因此数学 m ^ infty / math 的每一行都等于 v,因此平稳向量指定 x 的平衡结果概率。将数学 s,r,s,t,p / math 和数学 s y,r,t,s,p / math 定义为{ cc,cd,dc,dd }结果的短期收益向量(从 x 的角度来看) ,x 和 y 的均衡收益现在可以指定为数学 s x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x y y y y y y y y x y y y y y x y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y y。



====Zero-determinant strategies====



[[File:IPD Venn.svg|right|thumb|upright=2.5|The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (IPD) illustrated in a [[Venn diagram]]. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and [[weak selection]]. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]

The relationship between zero-determinant (ZD), cooperating and defecting strategies in the iterated prisoner's dilemma (IPD) illustrated in a [[Venn diagram. Cooperating strategies always cooperate with other cooperating strategies, and defecting strategies always defect against other defecting strategies. Both contain subsets of strategies that are robust under strong selection, meaning no other memory-1 strategy is selected to invade such strategies when they are resident in a population. Only cooperating strategies contain a subset that are always robust, meaning that no other memory-1 strategy is selected to invade and replace such strategies, under both strong and weak selection. The intersection between ZD and good cooperating strategies is the set of generous ZD strategies. Extortion strategies are the intersection between ZD and non-robust defecting strategies. Tit-for-tat lies at the intersection of cooperating, defecting and ZD strategies.]]

利用文献[1]中的维恩图,讨论了重复囚徒困境(IPD)中零行列式(ZD)、合作策略和变节策略之间的关系。合作策略总是与其他合作策略相互配合,而变通策略总是与其他变通策略相抵触。这两种策略都包含在强选择下具有鲁棒性的策略子集,这意味着当它们驻留在一个种群中时,没有其他记忆1策略被选择来入侵这样的策略。只有协作策略包含一个总是鲁棒的子集,这意味着在强选择和弱选择情况下,没有选择其他的 memory-1策略来入侵和替换这些策略。Zd 和好的合作策略之间的交集是一套慷慨的 ZD 策略。敲诈策略是 ZD 策略和非鲁棒性叛逃策略的交集。针锋相对是合作、背叛和 ZD 策略的交集。]



In 2012, [[William H. Press]] and [[Freeman Dyson]] published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies.<ref name="Press2012"/> The long term payoffs for encounters between ''X'' and ''Y'' can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector ''v''. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in ''f'', it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where ''U''={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation <math>\alpha s_x+\beta s_y+\gamma=0</math>.

In 2012, William H. Press and Freeman Dyson published a new class of strategies for the stochastic iterated prisoner's dilemma called "zero-determinant" (ZD) strategies. The long term payoffs for encounters between X and Y can be expressed as the determinant of a matrix which is a function of the two strategies and the short term payoff vectors: <math>s_x=D(P,Q,S_x)</math> and <math>s_y=D(P,Q,S_y)</math>, which do not involve the stationary vector v. Since the determinant function <math>s_y=D(P,Q,f)</math> is linear in f, it follows that <math>\alpha s_x+\beta s_y+\gamma=D(P,Q,\alpha S_x+\beta S_y+\gamma U)</math> (where U={1,1,1,1}). Any strategies for which <math>D(P,Q,\alpha S_x+\beta S_y+\gamma U)=0</math> is by definition a ZD strategy, and the long term payoffs obey the relation <math>\alpha s_x+\beta s_y+\gamma=0</math>.

2012年,威廉 · h · 普莱斯和弗里曼 · 戴森针对随机重复囚徒困境提出了一类新的策略,称为“零决定因素”策略。X 和 y 之间的长期收益可以表示为一个矩阵的决定因素,它是两个策略和短期收益向量的函数: 不涉及平稳向量 v 的 math s s x d (p,q,sx) / math 和 math s y d (p,q,sy) / math。 由于行列式函数 s y d (p,q,f) / math 在 f 中是线性的,因此可以推出 math alpha s x + βs y + γd (p,q,αs x + βs y + γu) / math (其中 u {1,1,1})。任何策略的数学 d (p,q, αsx + βsy + gamma u)0 / math 被定义为 ZD 策略,长期收益服从关系式。



Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of [[ultimatum game]]. Specifically, ''X'' is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math> to a specific value within a particular range of values, independent of ''Y'' 's strategy, offering an opportunity for ''X'' to "extort" player ''Y'' (and vice versa). (It turns out that if ''X'' tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.<ref name="Press2012"/>)

Tit-for-tat is a ZD strategy which is "fair" in the sense of not gaining advantage over the other player. However, the ZD space also contains strategies that, in the case of two players, can allow one player to unilaterally set the other player's score or alternatively, force an evolutionary player to achieve a payoff some percentage lower than his own. The extorted player could defect but would thereby hurt himself by getting a lower payoff. Thus, extortion solutions turn the iterated prisoner's dilemma into a sort of ultimatum game. Specifically, X is able to choose a strategy for which <math>D(P,Q,\beta S_y+\gamma U)=0</math>, unilaterally setting <math>s_y</math> to a specific value within a particular range of values, independent of Y 's strategy, offering an opportunity for X to "extort" player Y (and vice versa). (It turns out that if X tries to set <math>s_x</math> to a particular value, the range of possibilities is much smaller, only consisting of complete cooperation or complete defection.)

一报还一报是 ZD 战略,这是“公平”的意义上说,没有获得优势的其他球员。然而,ZD 空间也包含一些策略,在两个玩家的情况下,允许一个玩家单方面设置另一个玩家的分数,或者强迫一个进化的玩家获得比他自己的分数低一定百分比的回报。被敲诈的玩家可能会叛逃,但因此获得较低的回报而受到伤害。因此,敲诈的解决方案将重复的囚徒困境转化为一种最后通牒博弈。具体来说,x 能够选择一种策略,对于这种策略,数学 d (p,q, beta sy + gamma u)0 / math 单方面地将数学 s y / math 设置为一个特定值范围内的特定值,与 y 的策略无关,为 x 提供了一个“勒索”玩家 y 的机会(反之亦然)。(事实证明,如果 x 试图将 math s x / math 设置为一个特定的值,那么可能性的范围要小得多,只包括完全合作或完全背叛。)



An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not [[evolutionarily stable strategy|evolutionarily stable]]. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).<ref>{{cite journal|last=Adami|first=Christoph|author2=Arend Hintze|title=Evolutionary instability of Zero Determinant strategies demonstrates that winning isn't everything|journal=Nature Communications|volume=4|year=2013|page=3|arxiv=1208.2666|doi=10.1038/ncomms3193|pmid=23903782|pmc=3741637|bibcode=2013NatCo...4.2193A}}</ref>

An extension of the IPD is an evolutionary stochastic IPD, in which the relative abundance of particular strategies is allowed to change, with more successful strategies relatively increasing. This process may be accomplished by having less successful players imitate the more successful strategies, or by eliminating less successful players from the game, while multiplying the more successful ones. It has been shown that unfair ZD strategies are not evolutionarily stable. The key intuition is that an evolutionarily stable strategy must not only be able to invade another population (which extortionary ZD strategies can do) but must also perform well against other players of the same type (which extortionary ZD players do poorly, because they reduce each other's surplus).

Ipd 的一个扩展是进化随机 IPD,其中允许特定策略的相对丰度发生变化,更成功的策略相对增加。这个过程可以通过让不那么成功的玩家模仿更成功的策略来完成,或者通过从游戏中淘汰不那么成功的玩家,同时让更成功的玩家成倍增加。研究表明,不公平的 ZD 策略不是进化稳定策略。关键的直觉是,evolutional stable strategy 不仅要能够入侵另一个群体(这是敲诈 ZD 策略可以做到的) ,而且还要在同类型的其他玩家面前表现良好(敲诈 ZD 的玩家表现不佳,因为他们减少了彼此的盈余)。



Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and [[win–stay, lose–switch]] agents.<ref name=Hilbe2013 />

Theory and simulations confirm that beyond a critical population size, ZD extortion loses out in evolutionary competition against more cooperative strategies, and as a result, the average payoff in the population increases when the population is larger. In addition, there are some cases in which extortioners may even catalyze cooperation by helping to break out of a face-off between uniform defectors and win–stay, lose–switch agents.

理论和模拟证实,超过一个临界种群规模,ZD 勒索失去了在进化竞争对更多的合作策略,结果,平均回报在种群增加。此外,在有些情况下,勒索者甚至可能通过帮助打破穿制服的叛逃者与赢-留-输-转换代理人之间的对峙而促进合作。



While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies ''is'' both stable and robust. In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the [[Prisoner's dilemma#Special case: Donation game|donation game]] by Alexander Stewart and Joshua Plotkin in 2013.<ref name=Stewart2013>{{cite journal|last=Stewart|first=Alexander J.|author2=Joshua B. Plotkin|title=From extortion to generosity, evolution in the Iterated Prisoner's Dilemma|journal=[[Proceedings of the National Academy of Sciences of the United States of America]]|year=2013|doi=10.1073/pnas.1306246110|pmid=24003115|volume=110|issue=38|pages=15348–53|bibcode=2013PNAS..11015348S|pmc=3780848}}</ref> Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013)<ref name=Akin2013>{{cite arxiv|last=Akin|first=Ethan|title=Stable Cooperative Solutions for the Iterated Prisoner's Dilemma|year=2013|page=9|class=math.DS|eprint=1211.0969}} {{bibcode|2012arXiv1211.0969A}}</ref> to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.<ref name=Stewart2013 />

While extortionary ZD strategies are not stable in large populations, another ZD class called "generous" strategies is both stable and robust. In fact, when the population is not too small, these strategies can supplant any other ZD strategy and even perform well against a broad array of generic strategies for iterated prisoner's dilemma, including win–stay, lose–switch. This was proven specifically for the donation game by Alexander Stewart and Joshua Plotkin in 2013. Generous strategies will cooperate with other cooperative players, and in the face of defection, the generous player loses more utility than its rival. Generous strategies are the intersection of ZD strategies and so-called "good" strategies, which were defined by Akin (2013) to be those for which the player responds to past mutual cooperation with future cooperation and splits expected payoffs equally if he receives at least the cooperative expected payoff. Among good strategies, the generous (ZD) subset performs well when the population is not too small. If the population is very small, defection strategies tend to dominate.

虽然被敲诈的 ZD 策略在人口众多的情况下并不稳定,但另一种被称为“慷慨”的 ZD 策略既稳定又健壮。事实上,当人口不是太少的时候,这些策略可以取代任何其他 ZD 策略,甚至在一系列针对重复囚徒困境的通用策略中表现良好,包括赢-留,输-转换。亚历山大 · 斯图尔特和约书亚 · 普洛特金在2013年的捐赠游戏中证明了这一点。慷慨的策略会与其他合作的玩家合作,面对叛逃,慷慨的玩家比他的对手失去更多的效用。慷慨策略是 ZD 策略和所谓的“好”策略的交集,Akin (2013)将这两种策略定义为玩家对过去的相互合作作出回应,并在至少获得合作预期收益的情况下平均分配预期收益的策略。在好的策略中,当种群不太小时,慷慨(ZD)子集表现良好。如果人口很少,叛逃策略往往占主导地位。



===Continuous iterated prisoner's dilemma===

Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd<ref>{{cite journal | last1 = Le | first1 = S. | last2 = Boyd | first2 = R. |name-list-format=vanc| year = 2007 | title = Evolutionary Dynamics of the Continuous Iterated Prisoner's Dilemma | url = | journal = Journal of Theoretical Biology | volume = 245 | issue = 2| pages = 258–67 | doi = 10.1016/j.jtbi.2006.09.016 | pmid = 17125798 }}</ref> found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from [[Assortative mating|assorting]] with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.

Most work on the iterated prisoner's dilemma has focused on the discrete case, in which players either cooperate or defect, because this model is relatively simple to analyze. However, some researchers have looked at models of the continuous iterated prisoner's dilemma, in which players are able to make a variable contribution to the other player. Le and Boyd found that in such situations, cooperation is much harder to evolve than in the discrete iterated prisoner's dilemma. The basic intuition for this result is straightforward: in a continuous prisoner's dilemma, if a population starts off in a non-cooperative equilibrium, players who are only marginally more cooperative than non-cooperators get little benefit from assorting with one another. By contrast, in a discrete prisoner's dilemma, tit for tat cooperators get a big payoff boost from assorting with one another in a non-cooperative equilibrium, relative to non-cooperators. Since nature arguably offers more opportunities for variable cooperation rather than a strict dichotomy of cooperation or defection, the continuous prisoner's dilemma may help explain why real-life examples of tit for tat-like cooperation are extremely rare in nature (ex. Hammerstein<ref>Hammerstein, P. (2003). Why is reciprocity so rare in social animals? A protestant appeal. In: P. Hammerstein, Editor, Genetic and Cultural Evolution of Cooperation, MIT Press. pp. 83–94.

关于重复囚徒困境的研究大多集中在离散情况下,在这种情况下,参与者要么合作,要么缺陷,因为这个模型分析起来比较简单。然而,一些研究人员已经研究了连续迭代囚徒困境的模型,在这个模型中,玩家能够对另一个玩家做出可变的贡献。和 Boyd 发现,在这种情况下,合作比离散迭代的囚徒困境更难发展。这个结果的基本直觉很简单: 在一个持续的囚徒困境中,如果一个人群开始处于非合作均衡状态,那么只比非合作者稍微合作的玩家从相互配合中获益不大。相比之下,在离散的囚徒困境中,针锋相对的合作者相对于非合作者,在非合作均衡中相互配合会获得较大的回报提升。由于自然可以提供更多的机会来进行各种各样的合作,而不是严格地将合作或背叛分为两类,持续的囚徒困境可以帮助解释为什么现实生活中以牙还牙的合作的例子在自然界中极其罕见。哈默斯坦参考汉默斯坦,p。(2003)。为什么互惠在群居动物中如此罕见?新教徒的呼吁。合作的基因和文化进化》 ,麻省理工学院出版社。聚丙烯。83–94.

</ref>) even though tit for tat seems robust in theoretical models.

</ref>) even though tit for tat seems robust in theoretical models.

尽管在理论模型中,针锋相对似乎是有力的。



===Emergence of stable strategies===

Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies.<ref>{{cite book|last=Spaniel|first=William|title=Game Theory 101: The Complete Textbook|year=2011}}</ref> Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy&nbsp;– that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.<ref>{{cite journal|last=Nowak|first=Martin|author2=Karl Sigmund|title=A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner's Dilemma game|journal=Nature|year=1993|volume=364|issue=6432|doi=10.1038/364056a0|pages=56–58|pmid=8316296|bibcode=1993Natur.364...56N}}</ref>

Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies. Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy&nbsp;– that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.

玩家似乎不能协调相互合作,因此常常陷入劣势但稳定的叛逃策略。通过这种方式,迭代轮可以促进稳定策略的进化。多轮循环往往产生新颖的策略,这对复杂的社会互动有影响。其中一个策略就是“赢-留-输”的转变。这个策略比一个简单的以牙还牙策略要好——也就是说,如果你能逃脱作弊的惩罚,重复这个行为,但是如果你被抓住了,就改变策略。



The only problem of this tit-for-tat strategy is that they are vulnerable to signal error. The problem arises when one individual cheats in retaliation but the other interprets it as cheating. As a result of this, the second individual now cheats and then it starts a see-saw pattern of cheating in a chain reaction.

The only problem of this tit-for-tat strategy is that they are vulnerable to signal error. The problem arises when one individual cheats in retaliation but the other interprets it as cheating. As a result of this, the second individual now cheats and then it starts a see-saw pattern of cheating in a chain reaction.

这种针锋相对策略的唯一问题是,它们很容易出现信号错误。当一个人在报复中作弊,而另一个人将其解释为欺骗时,问题就出现了。结果,第二个人现在作弊,然后它开始了一个连锁反应的作弊模式。



==Real-life examples==

The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the [[social science]]s such as [[economics]], [[politics]], and [[sociology]], as well as to the biological sciences such as [[ethology]] and [[evolutionary biology]]. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

囚犯的设置看起来似乎是人为的,但实际上在人类交往以及自然界的交往中有许多具有相同收益矩阵的例子。囚徒困境是经济学、政治学、社会学等社会科学以及动物行为学、进化生物学等生物学研究的热点问题。许多自然过程都被抽象为生物进行无休止的囚徒困境博弈的模型。这种广泛的适用性,使游戏的实质性重要性。



===Environmental studies===

In [[environmental studies]], the PD is evident in crises such as global [[climate change|climate-change]]. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb [[Carbon dioxide|{{Co2}}]] emissions. The immediate benefit to any one country from maintaining current behavior is wrongly perceived to be greater than the purported eventual benefit to that country if all countries' behavior was changed, therefore explaining the impasse concerning climate-change in 2007.<ref>{{cite news|newspaper=[[The Economist]]|url=http://www.economist.com/finance/displaystory.cfm?story_id=9867020|title=Markets & Data|date=2007-09-27}}</ref>

In environmental studies, the PD is evident in crises such as global climate-change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb Carbon dioxide| emissions. The immediate benefit to any one country from maintaining current behavior is wrongly perceived to be greater than the purported eventual benefit to that country if all countries' behavior was changed, therefore explaining the impasse concerning climate-change in 2007.

在环境研究中,在诸如全球气候变化等危机中,这种差异显而易见。有人认为,所有国家都将从稳定的气候中受益,但是任何一个国家往往在遏制二氧化碳排放方面犹豫不决。人们错误地认为,如果所有国家的行为都改变,任何一个国家保持目前的行为所带来的直接好处都会大于所谓的最终好处,这就解释了2007年气候变化方面的僵局。



An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.<ref>{{cite web|last=Rehmeyer|first=Julie|title=Game theory suggests current climate negotiations won't avert catastrophe|url=https://www.sciencenews.org/article/game-theory-suggests-current-climate-negotiations-won%E2%80%99t-avert-catastrophe|work=Science News|publisher=Society for Science & the Public|date=2012-10-29}}</ref>

An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.

气候变化政治与囚徒困境之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此,政府面临的困境不同于囚徒困境,因为合作的回报是未知的。这种差异表明,各国之间的合作远远少于真正重复的囚徒困境中的合作,因此避免可能发生的气候灾难的可能性远远小于使用真正重复的囚徒困境进行的博弈论分析所提出的可能性。



Osang and Nandy (2003) provide a theoretical explanation with proofs for a regulation-driven win-win situation along the lines of [[Michael Porter]]'s hypothesis, in which government regulation of competing firms is substantial.<ref>{{cite thesis|type=paper|url= http://faculty.smu.edu/tosang/pdf/regln0803.pdf|first=Thomas|last=Osang|first2=Arundhati|last2=Nandyyz|date=August 2003|title=Environmental Regulation of Polluting Firms: Porter's Hypothesis Revisited}}</ref>

Osang and Nandy (2003) provide a theoretical explanation with proofs for a regulation-driven win-win situation along the lines of Michael Porter's hypothesis, in which government regulation of competing firms is substantial.

Osang 和 Nandy (2003)提供了一个理论解释,并根据 Michael Porter 的假设,即政府对竞争企业的监管是实质性的,提供了监管驱动的双赢局面的证据。



===Animals===

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, [[guppy|guppies]] inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.

许多动物的合作行为可以被理解为囚徒困境的一个例子。通常动物会有长期的伙伴关系,这种关系可以更具体地模拟为重复的囚徒困境。例如,孔雀鱼成群结队地合作检查捕食者,它们被认为是在惩罚不合作的检查员。



[[Vampire bats]] are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:<ref>{{cite book|last=Dawkins|first=Richard|title=The Selfish Gene|year=1976|publisher=Oxford University Press}}</ref>

Vampire bats are social animals that engage in reciprocal food exchange. Applying the payoffs from the prisoner's dilemma can help explain this behavior:

吸血蝙蝠是群居动物,从事相互的食物交换。应用囚徒困境的收益可以帮助解释这种行为:

* C/C: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."

* D/C: "Temptation: You save my life on my poor night. But then I get the added benefit of not having to pay the slight cost of feeding you on my good night."

* C/D: "Sucker's Payoff: I pay the cost of saving your life on my good night. But on my bad night you don't feed me and I run a real risk of starving to death."

* D/D: "Punishment: I don't have to pay the slight costs of feeding you on my good nights. But I run a real risk of starving on my poor nights."



===Psychology===

In [[addiction]] research / [[behavioral economics]], [[George Ainslie (psychologist)|George Ainslie]] points out<ref>{{cite book |first=George|last=Ainslie |title=Breakdown of Will |year=2001 |isbn=978-0-521-59694-7}}</ref> that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, ''defecting'' means ''relapsing'', and it is easy to see that not defecting both today and in the future is by far the best outcome. The case where one abstains today but relapses in the future is the worst outcome&nbsp;– in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). Relapsing today and tomorrow is a slightly "better" outcome, because while the addict is still addicted, they haven't put the effort in to trying to stop. The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

In addiction research / behavioral economics, George Ainslie points out that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome. The case where one abstains today but relapses in the future is the worst outcome&nbsp;– in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). Relapsing today and tomorrow is a slightly "better" outcome, because while the addict is still addicted, they haven't put the effort in to trying to stop. The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

在成瘾研究 / 行为经济学中,乔治 · 安斯利指出,成瘾可以被描述为成瘾者现在和未来自我之间的跨期 PD 问题。在这种情况下,叛逃意味着反复,很容易看出,不在今天和未来叛逃是迄今为止最好的结果。如果一个人今天戒了,但在将来又复吸,这是最糟糕的结果——从某种意义上来说,今天戒瘾所包含的纪律和自我牺牲已经被“浪费”了,因为未来的复吸意味着瘾君子又回到了他开始的地方,将不得不重新开始(这相当令人沮丧,也使得重新开始更加困难)。今天和明天复发是一个稍微“更好”的结果,因为当瘾君子仍然上瘾时,他们没有努力去尝试停止。最后一种情况,一个人在今天进行成瘾行为,而在明天弃权,这对于任何一个与成瘾作斗争的人来说都是熟悉的。这里的问题是(和其他公共安全部门一样) ,背叛“今天”有一个明显的好处,但明天一个人将面临同样的公共安全问题,同样明显的好处将出现,最终导致一连串无休止的背叛。



[[John Gottman]] in his research described in "the science of trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop.

John Gottman in his research described in "the science of trust" defines good relationships as those where partners know not to enter the (D,D) cell or at least not to get dynamically stuck there in a loop.

John Gottman 在他的研究《信任的科学》中将良好的关系定义为伴侣知道不要进入(d,d)细胞或者至少不要陷入一个循环中。



===Economics===

The prisoner's dilemma has been called the ''[[Escherichia coli|E. coli]]'' of social psychology, and it has been used widely to research various topics such as [[Oligopoly|oligopolistic]] competition and collective action to produce a collective good.<ref>{{Cite journal|last=Axelrod|first=Robert|date=1980|title=Effective Choice in the Prisoner's Dilemma|journal=The Journal of Conflict Resolution|volume=24|issue=1|pages=3–25|issn=0022-0027|jstor=173932|doi=10.1177/002200278002400101|url=https://semanticscholar.org/paper/fd1ab82470446bfb12c39f0c577644291027cf76}}</ref>

The prisoner's dilemma has been called the E. coli of social psychology, and it has been used widely to research various topics such as oligopolistic competition and collective action to produce a collective good.

囚徒困境被称为社会心理学中的大肠杆菌,它被广泛用于研究寡头垄断竞争和集体行动等问题,以产生集体利益。



Advertising is sometimes cited as a real-example of the prisoner's dilemma. When [[cigarette advertising]] was legal in the United States, competing cigarette manufacturers had to decide how much money to spend on advertising. The effectiveness of Firm A's advertising was partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B chose to advertise during a given period, then the advertisement from each firm negates the other's, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising. Nevertheless, the optimal amount of advertising by one firm depends on how much advertising the other undertakes. As the best strategy is dependent on what the other firm chooses there is no dominant strategy, which makes it slightly different from a prisoner's dilemma. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry.{{Citation needed|reason=This reference doesn't mention or support the claimed historical account.|date=December 2012}}{{efn|1=This argument for the development of cooperation through trust is given in ''[[The Wisdom of Crowds]]'', where it is argued that long-distance [[capitalism]] was able to form around a nucleus of [[Religious Society of Friends|Quakers]], who always dealt honourably with their business partners. (Rather than defecting and reneging on promises&nbsp;– a phenomenon that had discouraged earlier long-term unenforceable overseas contracts). It is argued that dealings with reliable merchants allowed the [[meme]] for cooperation to spread to other traders, who spread it further until a high degree of cooperation became a profitable strategy in general [[commerce]]}} This analysis is likely to be pertinent in many other business situations involving advertising.{{Citation needed|reason=This doesn't sound like cooperation|date=November 2012}}

Advertising is sometimes cited as a real-example of the prisoner's dilemma. When cigarette advertising was legal in the United States, competing cigarette manufacturers had to decide how much money to spend on advertising. The effectiveness of Firm A's advertising was partially determined by the advertising conducted by Firm B. Likewise, the profit derived from advertising for Firm B is affected by the advertising conducted by Firm A. If both Firm A and Firm B chose to advertise during a given period, then the advertisement from each firm negates the other's, receipts remain constant, and expenses increase due to the cost of advertising. Both firms would benefit from a reduction in advertising. However, should Firm B choose not to advertise, Firm A could benefit greatly by advertising. Nevertheless, the optimal amount of advertising by one firm depends on how much advertising the other undertakes. As the best strategy is dependent on what the other firm chooses there is no dominant strategy, which makes it slightly different from a prisoner's dilemma. The outcome is similar, though, in that both firms would be better off were they to advertise less than in the equilibrium. Sometimes cooperative behaviors do emerge in business situations. For instance, cigarette manufacturers endorsed the making of laws banning cigarette advertising, understanding that this would reduce costs and increase profits across the industry. This analysis is likely to be pertinent in many other business situations involving advertising.

广告有时被引用为囚徒困境的一个真实例子。当香烟广告在美国是合法的时候,相互竞争的香烟制造商必须决定在广告上花多少钱。公司 a 的广告效果部分取决于公司 b 的广告效果。同样,公司 b 的广告利润也受到公司 a 的广告影响。如果公司 a 和公司 b 都选择在给定的时间段内做广告,那么两家公司的广告就会抵消对方的广告,收入保持不变,费用因广告成本而增加。两家公司都将从广告减少中获益。然而,如果 b 公司选择不做广告,a 公司可以通过广告获得巨大的利益。然而,一家公司的最佳广告数量取决于另一家公司承担了多少广告。由于最佳策略取决于其他公司的选择,因此没有占主导地位的策略,这使得它与囚徒困境略有不同。结果是相似的,尽管,在这两个公司会更好,如果他们的广告少于在均衡。有时合作行为确实会在商业环境中出现。例如,香烟制造商支持立法禁止香烟广告,理解这将降低成本和增加整个行业的利润。这种分析可能适用于许多其他涉及广告的商业情况。



Without enforceable agreements, members of a [[cartel]] are also involved in a (multi-player) prisoner's dilemma.<ref>{{Cite book|last1=Nicholson|first=Walter|year=2000|title=Intermediate microeconomics and its application|edition=8th|location=Fort Worth, TX|publisher=Dryden Press : Harcourt College Publishers|isbn=978-0-030-25916-6}}</ref> 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. [[Anti-trust]] authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for [[consumer]]s.

Without enforceable agreements, members of a cartel are also involved in a (multi-player) prisoner's dilemma. 'Cooperating' typically means keeping prices at a pre-agreed minimum level. 'Defecting' means selling under this minimum level, instantly taking business (and profits) from other cartel members. Anti-trust authorities want potential cartel members to mutually defect, ensuring the lowest possible prices for consumers.

没有可强制执行的协议,卡特尔的成员国也会陷入(多玩家)囚徒困境。“合作”通常意味着将价格保持在预先商定的最低水平。“叛逃”意味着在这个最低水平下销售,立即从其他卡特尔成员那里获得业务(和利润)。反垄断机构希望潜在的卡特尔成员相互背叛,确保消费者获得尽可能低的价格。



===Sport===

[[Doping in sport]] has been cited as an example of a prisoner's dilemma.<ref name="wired">{{cite journal|last=Schneier |first=Bruce |url=https://www.wired.com/opinion/2012/10/lance-armstrong-and-the-prisoners-dilemma-of-doping-in-professional-sports/ |title=Lance Armstrong and the Prisoners' Dilemma of Doping in Professional Sports &#124; Wired Opinion |journal=Wired |publisher=Wired.com |date=2012-10-26 |accessdate=2012-10-29}}</ref>

Doping in sport has been cited as an example of a prisoner's dilemma.

体育运动中的兴奋剂被认为是囚徒困境的一个例子。



Two competing athletes have the option to use an illegal and/or dangerous drug to boost their performance. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over their competitor, reduced by the legal and/or medical dangers of having taken the drug. If both athletes take the drug, however, the benefits cancel out and only the dangers remain, putting them both in a worse position than if neither had used doping.<ref name="wired" />

Two competing athletes have the option to use an illegal and/or dangerous drug to boost their performance. If neither athlete takes the drug, then neither gains an advantage. If only one does, then that athlete gains a significant advantage over their competitor, reduced by the legal and/or medical dangers of having taken the drug. If both athletes take the drug, however, the benefits cancel out and only the dangers remain, putting them both in a worse position than if neither had used doping.

两名参赛运动员可以选择使用非法和 / 或危险药物来提高成绩。如果两个运动员都没有服用这种药物,那么他们都不会获得优势。如果只有一个人这样做,那么这个运动员就比他们的竞争对手获得了明显的优势,减少了服用药物的法律和 / 或医疗危险。然而,如果两名运动员都服用了这种药物,那么好处就被抵消了,只剩下危险,这使得他们的处境比没有服用兴奋剂的情况更加糟糕。



===International politics===

In [[international politics|international political theory]], the Prisoner's Dilemma is often used to demonstrate the coherence of [[strategic realism]], which holds that in international relations, all states (regardless of their internal policies or professed ideology), will act in their rational self-interest given [[anarchy (international relations)|international anarchy]]. A classic example is an arms race like the [[Cold War]] and similar conflicts.<ref>{{cite journal| title = Arms races as iterated prisoner's dilemma games | author = Stephen J. Majeski | journal = Mathematical and Social Sciences | volume = 7 | issue = 3 | pages = 253–66 | year = 1984 | doi=10.1016/0165-4896(84)90022-2}}</ref> During the Cold War the opposing alliances of [[NATO]] and the [[Warsaw Pact]] both had the choice to arm or disarm. From each side's point of view, disarming whilst their opponent continued to arm would have led to military inferiority and possible annihilation. Conversely, arming whilst their opponent disarmed would have led to superiority. If both sides chose to arm, neither could afford to attack the other, but both incurred the high cost of developing and maintaining a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs.

In international political theory, the Prisoner's Dilemma is often used to demonstrate the coherence of strategic realism, which holds that in international relations, all states (regardless of their internal policies or professed ideology), will act in their rational self-interest given international anarchy. A classic example is an arms race like the Cold War and similar conflicts. During the Cold War the opposing alliances of NATO and the Warsaw Pact both had the choice to arm or disarm. From each side's point of view, disarming whilst their opponent continued to arm would have led to military inferiority and possible annihilation. Conversely, arming whilst their opponent disarmed would have led to superiority. If both sides chose to arm, neither could afford to attack the other, but both incurred the high cost of developing and maintaining a nuclear arsenal. If both sides chose to disarm, war would be avoided and there would be no costs.

在国际政治理论中,囚徒困境经常被用来证明战略现实主义的一致性,这种战略现实主义认为,在国际关系中,由于国际无政府状态,所有国家(无论其国内政策或公开宣称的意识形态如何)都会为了自身的理性利益。一个典型的例子是类似冷战和类似冲突的军备竞赛。在冷战期间,北约和华沙条约组织的对立联盟都可以选择武装或解除武装。从双方的观点来看,解除武装而对手继续武装将导致军事劣势和可能的歼灭。相反,如果武装的时候对手已经解除了武装,那么就会获得优势。如果双方都选择武装对方,那么任何一方都承担不起攻击对方的代价,但是双方都为发展和维持核武库付出了高昂的代价。如果双方都选择裁军,战争就可以避免,也不会有任何代价。



Although the 'best' overall outcome is for both sides to disarm, the rational course for both sides is to arm, and this is indeed what happened. Both sides poured enormous resources into military research and armament in a [[War of attrition (game)|war of attrition]] for the next thirty years until the Soviet Union could not withstand the economic cost.<ref>{{Citation|last=Kuhn|first=Steven|title=Prisoner's Dilemma|date=2019|url=https://plato.stanford.edu/archives/win2019/entries/prisoner-dilemma/|encyclopedia=The Stanford Encyclopedia of Philosophy|editor-last=Zalta|editor-first=Edward N.|edition=Winter 2019|publisher=Metaphysics Research Lab, Stanford University|access-date=2020-04-12}}</ref> The same logic could be applied in any similar scenario, be it economic or technological competition between sovereign states.

Although the 'best' overall outcome is for both sides to disarm, the rational course for both sides is to arm, and this is indeed what happened. Both sides poured enormous resources into military research and armament in a war of attrition for the next thirty years until the Soviet Union could not withstand the economic cost. The same logic could be applied in any similar scenario, be it economic or technological competition between sovereign states.

虽然最好的结果是双方解除武装,但是双方的理性选择是武装起来,事实也的确如此。在接下来的三十年里,双方都在埃以消耗战争的军事研究和武器装备上投入了大量的资源,直到苏联无法承受经济损失。同样的逻辑也适用于任何类似的情况,无论是主权国家之间的经济竞争还是技术竞争。



===Multiplayer dilemmas===

Many real-life dilemmas involve multiple players.<ref>Gokhale CS, Traulsen A. Evolutionary games in the multiverse. Proceedings of the National Academy of Sciences. 2010 Mar 23. 107(12):5500–04.</ref> Although metaphorical, [[Garrett Hardin|Hardin's]] [[tragedy of the commons]] may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident the collective cost is no clean plates for anyone.

Many real-life dilemmas involve multiple players. Although metaphorical, Hardin's tragedy of the commons may be viewed as an example of a multi-player generalization of the PD: Each villager makes a choice for personal gain or restraint. The collective reward for unanimous (or even frequent) defection is very low payoffs (representing the destruction of the "commons"). A commons dilemma most people can relate to is washing the dishes in a shared house. By not washing dishes an individual can gain by saving his time, but if that behavior is adopted by every resident the collective cost is no clean plates for anyone.

许多现实生活中的困境牵涉到多个参与者。虽然比喻性的,Hardin 的公地悲剧可以被看作是 PD 的多人泛化的一个例子: 每个村民做出选择是为了个人利益还是为了克制。对于一致(甚至频繁)叛逃的集体奖励是非常低的回报(代表了对“公共资源”的破坏)。一个大多数人都能理解的共同困境就是在一个共享的房子里洗碗。通过不洗碗,个人可以节省时间,但如果这种行为被每个居民采纳,集体成本是没有干净的盘子为任何人。



The commons are not always exploited: [[William Poundstone]], in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to [[Excludability|take a paper without paying]] (''defecting'') but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by [[Elinor Ostrom]], winner of the 2009 [[Nobel Memorial Prize in Economic Sciences]], hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.<ref>{{cite web|url=http://volokh.com/2009/10/12/elinor-ostrom-and-the-tragedy-of-the-commons/ |title=The Volokh Conspiracy " Elinor Ostrom and the Tragedy of the Commons |publisher=Volokh.com |date=2009-10-12 |accessdate=2011-12-17}}</ref>

The commons are not always exploited: William Poundstone, in a book about the prisoner's dilemma (see References below), describes a situation in New Zealand where newspaper boxes are left unlocked. It is possible for people to take a paper without paying (defecting) but very few do, feeling that if they do not pay then neither will others, destroying the system. Subsequent research by Elinor Ostrom, winner of the 2009 Nobel Memorial Prize in Economic Sciences, hypothesized that the tragedy of the commons is oversimplified, with the negative outcome influenced by outside influences. Without complicating pressures, groups communicate and manage the commons among themselves for their mutual benefit, enforcing social norms to preserve the resource and achieve the maximum good for the group, an example of effecting the best case outcome for PD.

公共资源并不总是被利用: 威廉 · 庞德斯通在一本关于囚徒困境的书(见下文参考文献)中描述了新西兰的一种情况,报纸盒子没有上锁。人们可以不付钱就拿报纸(叛逃) ,但很少有人这样做,他们觉得如果他们不付钱,那么其他人也不会付钱,这会摧毁整个体系。2009年诺贝尔经济学奖获得者 Elinor Ostrom 随后的研究假设,公地悲剧经济学过于简单化,其负面结果受到外部影响。在没有复杂压力的情况下,团体之间为了共同利益进行沟通和管理,执行社会规范以保护资源并为团体实现最大利益,这是影响方案发展最佳结果的一个例子。



==Related games==

===Closed-bag exchange===

[[File:Prisoner's Dilemma briefcase exchange (colorized).svg|thumb|The prisoner's dilemma as a briefcase exchange]]

The prisoner's dilemma as a briefcase exchange

囚徒困境是一个公文包的交换

[[Douglas Hofstadter]]<ref name="dh">{{cite book | first=Douglas R. | last=Hofstadter| authorlink=Douglas Hofstadter | title= Metamagical Themas: questing for the essence of mind and pattern | publisher= Bantam Dell Pub Group| year=1985 | isbn=978-0-465-04566-2|chapter= Ch.29 ''The Prisoner's Dilemma Computer Tournaments and the Evolution of Cooperation''.| title-link=Metamagical Themas}}</ref> once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was "closed bag exchange":

Douglas Hofstadter once suggested that people often find problems such as the PD problem easier to understand when it is illustrated in the form of a simple game, or trade-off. One of several examples he used was "closed bag exchange":

侯世达曾经指出,人们通常会发现问题,比如 PD 问题,当它以一个简单游戏的形式表现出来时,或者以权衡的方式表现出来时,会更容易理解。他使用的几个例子之一是“封闭式袋子交换” :

{{quote|Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.}}

Defection always gives a game-theoretically preferable outcome.<ref>{{Cite web|url=https://users.auth.gr/kehagiat/Research/GameTheory/06GamesToPlay/Prisoner%27s_dilemma.htm#Closed_Bag_Exchange|title=Prisoner's dilemma - Wikipedia, the free encyclopedia|website=users.auth.gr|access-date=2020-04-12}}</ref>

Defection always gives a game-theoretically preferable outcome.

叛逃总是会带来一个理论上更可取的结果。



===''Friend or Foe?''===

''[[Friend or Foe? (TV series)|Friend or Foe?]]'' is a game show that aired from 2002 to 2005 on the [[Game Show Network]] in the US. It is an example of the prisoner's dilemma game tested on real people, but in an artificial setting. On the game show, three pairs of people compete. When a pair is eliminated, they play a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the reward matrix is slightly different from the standard one given above, as the rewards for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If a contestant knows that their opponent is going to vote "Foe", then their own choice does not affect their own winnings. In a specific sense, ''Friend or Foe'' has a rewards model between prisoner's dilemma and the [[Chicken (game)|game of Chicken]].

Friend or Foe? is a game show that aired from 2002 to 2005 on the Game Show Network in the US. It is an example of the prisoner's dilemma game tested on real people, but in an artificial setting. On the game show, three pairs of people compete. When a pair is eliminated, they play a game similar to the prisoner's dilemma to determine how the winnings are split. If they both cooperate (Friend), they share the winnings 50–50. If one cooperates and the other defects (Foe), the defector gets all the winnings and the cooperator gets nothing. If both defect, both leave with nothing. Notice that the reward matrix is slightly different from the standard one given above, as the rewards for the "both defect" and the "cooperate while the opponent defects" cases are identical. This makes the "both defect" case a weak equilibrium, compared with being a strict equilibrium in the standard prisoner's dilemma. If a contestant knows that their opponent is going to vote "Foe", then their own choice does not affect their own winnings. In a specific sense, Friend or Foe has a rewards model between prisoner's dilemma and the game of Chicken.

朋友还是敌人?是一个游戏节目,从2002年至2005年在美国的游戏节目网络播出。这是囚徒困境游戏在真人身上测试的一个例子,但是是在人为的环境中。在游戏节目中,有三对选手参加比赛。当一对被淘汰时,他们会玩一个类似囚徒困境的游戏来决定奖金如何分配。如果他们都合作(朋友) ,他们分享奖金50-50。如果一方合作而另一方有缺陷(敌人) ,那么叛逃者将得到所有的奖金,而合作者将一无所获。如果双方都有缺陷,那么双方都将一无所有。请注意,奖励矩阵与上面给出的标准矩阵略有不同,因为“双方都有缺陷”和“合作而对方有缺陷”情况下的奖励是相同的。与标准囚徒困境中的严格均衡相比,这使得“两个缺陷”情况成为一个弱均衡。如果一个参赛者知道他们的对手将投票给“敌人” ,那么他们自己的选择不会影响他们自己的奖金。从特定意义上讲,《敌友》在囚徒困境和“胆小鬼”博弈之间有一个奖励模型。



The rewards matrix is

The rewards matrix is

奖励矩阵是

{| class="wikitable"

{| class="wikitable"

{ | class“ wikitable”

! {{diagonal split header|{{color|#009|Pair 1}}|{{color|#900|Pair 2}}}}

! |}}

!|}}

! scope="col" style="width:6em;" | {{color|#900|"Friend"<br />(cooperate)}}

! scope="col" style="width:6em;" |

!范围“ col”风格“ width: 6em; ” |

! scope="col" style="width:6em;" | {{color|#900|"Foe"<br />(defect)}}

! scope="col" style="width:6em;" |

!范围“ col”风格“ width: 6em; ” |

|-

|-

|-

! scope="row" style="width:6em;" | {{color|#009|"Friend"<br />(cooperate)}}

! scope="row" style="width:6em;" |

!范围“行”风格“宽度: 6em; ” |

| {{diagonal split header|{{color|#009|1}}|{{color|#900|1}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|0}}|{{color|#900|2}}|transparent}}

| ||transparent}}

会透明的

|-

|-

|-

! scope="row" | {{color|#009|"Foe"<br />(defect)}}

! scope="row" |

!瞄准镜

| {{diagonal split header|{{color|#009|2}}|{{color|#900|0}}|transparent}}

| ||transparent}}

会透明的

| {{diagonal split header|{{color|#009|0}}|{{color|#900|0}}|transparent}}

| ||transparent}}

会透明的

|}

|}

|}



This payoff matrix has also been used on the [[United Kingdom|British]] [[television]] programmes ''Trust Me'', ''[[Shafted]]'', ''[[The Bank Job (TV series)|The Bank Job]]'' and ''[[Golden Balls]]'', and on the [[United States|American]] shows ''[[Bachelor Pad]]'' and ''[[Take It All (game show)|Take It All]]''. Game data from the ''[[Golden Balls]]'' series has been analyzed by a team of economists, who found that cooperation was "surprisingly high" for amounts of money that would seem consequential in the real world, but were comparatively low in the context of the game.<ref>{{cite journal | ssrn=1592456 | title=Split or Steal? Cooperative Behavior When the Stakes Are Large | author=Van den Assem, Martijn J. | journal=Management Science |date=January 2012 | volume=58 | issue=1 | pages=2–20 | doi=10.1287/mnsc.1110.1413| url=http://faculty.chicagobooth.edu/richard.thaler/research/pdf/Split%20or%20Steal%20Cooperative%20Behavior%20When%20the%20Stakes%20Are%20Large.pdf }}</ref>

This payoff matrix has also been used on the British television programmes Trust Me, Shafted, The Bank Job and Golden Balls, and on the American shows Bachelor Pad and Take It All. Game data from the Golden Balls series has been analyzed by a team of economists, who found that cooperation was "surprisingly high" for amounts of money that would seem consequential in the real world, but were comparatively low in the context of the game.

英国电视节目《相信我》、《阴影》、《银行工作》和《黄金球》以及美国电视节目《单身公寓》和《全部拿走》也采用了这种盈利模式。一个经济学家团队分析了金球系列的游戏数据,他们发现,对于在现实世界中看似重要的金钱数量,合作程度“惊人地高” ,但在游戏的背景下,合作程度相对较低。



===Iterated snowdrift===

{{main|snowdrift game}}



Researchers from the [[University of Lausanne]] and the [[University of Edinburgh]] have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a [[chicken game]], it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a [[snowdrift]], each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.

Researchers from the University of Lausanne and the University of Edinburgh have suggested that the "Iterated Snowdrift Game" may more closely reflect real-world social situations. Although this model is actually a chicken game, it will be described here. In this model, the risk of being exploited through defection is lower, and individuals always gain from taking the cooperative choice. The snowdrift game imagines two drivers who are stuck on opposite sides of a snowdrift, each of whom is given the option of shoveling snow to clear a path, or remaining in their car. A player's highest payoff comes from leaving the opponent to clear all the snow by themselves, but the opponent is still nominally rewarded for their work.

来自洛桑大学和爱丁堡大学的研究人员认为,“迭代雪堆游戏”可能更能反映现实世界的社会状况。虽然这个模型实际上是一个小鸡博弈,它将在这里描述。在这个模型中,由于背叛而被剥削的风险较低,个体总是从合作选择中获益。这个雪堆游戏设想两个司机被困在雪堆的两侧,每个司机都可以选择铲雪清理道路,或者留在自己的车里。一个玩家的最高回报来自于让对手自己清除所有的积雪,但是对手名义上仍然因为他们的工作而得到回报。



This may better reflect real world scenarios, the researchers giving the example of two scientists collaborating on a report, both of whom would benefit if the other worked harder. "But when your collaborator doesn’t do any work, it’s probably better for you to do all the work yourself. You’ll still end up with a completed project."<ref>{{cite web|last=Kümmerli|first=Rolf|title='Snowdrift' game tops 'Prisoner's Dilemma' in explaining cooperation|url=http://phys.org/news111145481.html|accessdate=11 April 2012}}</ref>

This may better reflect real world scenarios, the researchers giving the example of two scientists collaborating on a report, both of whom would benefit if the other worked harder. "But when your collaborator doesn’t do any work, it’s probably better for you to do all the work yourself. You’ll still end up with a completed project."

这可能更好地反映了现实世界的情景---- 研究人员举了两位科学家合作完成一份报告的例子,如果另一位科学家更加努力地工作,这两位科学家都会受益。“但当你的合作者不做任何工作时,你自己做所有的工作可能会更好。你最终还是会完成一个项目。”



{|

{|

{|

|-

|-

|-

|

|

|

{| class="wikitable" style="text-align: center;"

{| class="wikitable" style="text-align: center;"

{ | 类“ wikitable”样式“ text-align: center; ”

|+ Example snowdrift payouts (A, B)

|+ Example snowdrift payouts (A, B)

| + 例子雪堆补偿(a,b)

! {{diagonal split header|&nbsp;A|B&nbsp;}} !! Cooperates !! Defects

! !! Cooperates !! Defects

!!!合作!缺陷

|-

|-

|-

! Cooperates

! Cooperates

!合作

| 200, 200 || 100, 300

| 200, 200 || 100, 300

| 200, 200 || 100, 300

|-

|-

|-

! Defects

! Defects

!缺陷

| 300, 100 || 0, 0

| 300, 100 || 0, 0

| 300, 100 || 0, 0

|}

|}

|}

||

||

||

{| class="wikitable" style="text-align: center;margin-left:2em;"

{| class="wikitable" style="text-align: center;margin-left:2em;"

{ | class“ wikitable”样式“ text-align: center; margin-left: 2em; ”

|+ Example PD payouts (A, B)

|+ Example PD payouts (A, B)

| + PD 支出示例(a,b)

! {{diagonal split header|&nbsp;A|B&nbsp;}} !! Cooperates !! Defects

! !! Cooperates !! Defects

!!!合作!缺陷

|-

|-

|-

! Cooperates

! Cooperates

!合作

| 200, 200 || -100, 300

| 200, 200 || -100, 300

| 200, 200 || -100, 300

|-

|-

|-

! Defects

! Defects

!缺陷

| 300, -100 || 0, 0

| 300, -100 || 0, 0

| 300, -100 || 0, 0

|}

|}

|}



|}

|}

|}



===Coordination games===

{{main|coordination games}}

In coordination games, players must coordinate their strategies for a good outcome. An example is two cars that abruptly meet in a blizzard; each must choose whether to swerve left or right. If both swerve left, or both right, the cars do not collide. The local [[left- and right-hand traffic]] convention helps to co-ordinate their actions.

In coordination games, players must coordinate their strategies for a good outcome. An example is two cars that abruptly meet in a blizzard; each must choose whether to swerve left or right. If both swerve left, or both right, the cars do not collide. The local left- and right-hand traffic convention helps to co-ordinate their actions.

在协调博弈中,参与者必须协调他们的策略以获得一个好的结果。一个例子是两辆车在暴风雪中突然相遇,每辆车必须选择是左转还是右转。如果两辆车都向左转弯,或者都向右转弯,那么两辆车就不会相撞。当地的左右手交通大会有助于协调他们的行动。



Symmetrical co-ordination games include [[Stag hunt]] and [[Bach or Stravinsky]].

Symmetrical co-ordination games include Stag hunt and Bach or Stravinsky.

对称的协调游戏包括猎鹿和巴赫或斯特拉文斯基。



===Asymmetric prisoner's dilemmas===

A more general set of games are asymmetric. As in the prisoner's dilemma, the best outcome is co-operation, and there are motives for defection. Unlike the symmetric prisoner's dilemma, though, one player has more to lose and/or more to gain than the other. Some such games have been described as a prisoner's dilemma in which one prisoner has an [[alibi]], whence the term "alibi game".<ref>{{cite conference|last1=Robinson |first1=D.R. |last2=Goforth |first2=D.J. |title=Alibi games: the Asymmetric Prisoner' s Dilemmas |date=May 5, 2004 |url=https://economics.ca/2004/papers/0359.pdf |conference=Meetings of the Canadian Economics Association, Toronto, June 4-6, 2004}}</ref>

A more general set of games are asymmetric. As in the prisoner's dilemma, the best outcome is co-operation, and there are motives for defection. Unlike the symmetric prisoner's dilemma, though, one player has more to lose and/or more to gain than the other. Some such games have been described as a prisoner's dilemma in which one prisoner has an alibi, whence the term "alibi game".

一个更一般的游戏集是不对称的。就像在囚徒困境中一样,最好的结果是合作,而且背叛是有动机的。不像对称的囚徒困境,一方比另一方有更多的损失和 / 或获得。有些这样的游戏被描述为囚徒困境,其中一个囚徒有不在场证明,这就是术语“不在场证明游戏”的由来。



In experiments, players getting unequal payoffs in repeated games may seek to maximize profits, but only under the condition that both players receive equal payoffs; this may lead to a stable equilibrium strategy in which the disadvantaged player defects every X games, while the other always co-operates. Such behaviour may depend on the experiment's social norms around fairness.<ref>{{cite chapter|last1=Beckenkamp |first1=Martin |last2=Hennig-Schmidt |first2=Heike |last3=Maier-Rigaud |first3=Frank P. |chapter=Cooperation in Symmetric and Asymmetric Prisoner's Dilemma Games |date=March 4, 2007 |chapter-url=http://homepage.coll.mpg.de/pdf_dat/2006_25online.pdf |title=[[Max Planck Institute for Research on Collective Goods]]}}</ref>

In experiments, players getting unequal payoffs in repeated games may seek to maximize profits, but only under the condition that both players receive equal payoffs; this may lead to a stable equilibrium strategy in which the disadvantaged player defects every X games, while the other always co-operates. Such behaviour may depend on the experiment's social norms around fairness.

在实验中,重复博弈中获得不平等收益的参与者可能会寻求利润最大化,但是只有在两个参与者获得相同收益的条件下,这可能导致一个稳定的均衡策略,即弱势参与者在每个 x 博弈中都会缺陷,而另一个参与者总是合作。这种行为可能取决于实验围绕公平的社会规范。



==Software==



Several software packages have been created to run prisoner's dilemma simulations and tournaments, some of which have available source code.

Several software packages have been created to run prisoner's dilemma simulations and tournaments, some of which have available source code.

一些软件包已经被创建来运行囚徒困境模拟和比赛,其中一些有可用的源代码。

* The source code for the [[The Evolution of Cooperation|second tournament]] run by Robert Axelrod (written by Axelrod and many contributors in [[Fortran]]) is available [http://www-personal.umich.edu/~axe/research/Software/CC/CC2.html online]

* [https://web.archive.org/web/19991010053242/http://www.lifl.fr/IPD/ipd.frame.html Prison], a library written in [[Java (programming language)|Java]], last updated in 1998

* [https://github.com/Axelrod-Python/Axelrod Axelrod-Python], written in [[Python (programming language)|Python]]

* [http://selborne.nl/ipd/ play the Iterative Prisoner's Dilemma in the browser], play against strategies or let strategies play against other strategies



==In fiction==

[[Hannu Rajaniemi]] set the opening scene of his ''[[The Quantum Thief]]'' trilogy in a "dilemma prison". The main theme of the series has been described as the "inadequacy of a binary universe" and the ultimate antagonist is a character called the All-Defector. Rajaniemi is particularly interesting as an artist treating this subject in that he is a Cambridge-trained mathematician and holds a PhD in [[mathematical physics]]&nbsp;– the interchangeability of matter and information is a major feature of the books, which take place in a "post-singularity" future. The first book in the series was published in 2010, with the two sequels, ''[[The Fractal Prince]]'' and ''[[The Causal Angel]]'', published in 2012 and 2014, respectively.

Hannu Rajaniemi set the opening scene of his The Quantum Thief trilogy in a "dilemma prison". The main theme of the series has been described as the "inadequacy of a binary universe" and the ultimate antagonist is a character called the All-Defector. Rajaniemi is particularly interesting as an artist treating this subject in that he is a Cambridge-trained mathematician and holds a PhD in mathematical physics&nbsp;– the interchangeability of matter and information is a major feature of the books, which take place in a "post-singularity" future. The first book in the series was published in 2010, with the two sequels, The Fractal Prince and The Causal Angel, published in 2012 and 2014, respectively.

汉努 · 拉贾尼埃米将他的《量子窃贼》三部曲的开场场景设置在一个“进退两难的监狱”中。该系列的主题被描述为“双重宇宙的不足” ,最终的对手是一个叫做全面叛逃者的角色。拉贾尼埃米作为一个处理这个问题的艺术家尤其有趣,因为他是剑桥大学培养的数学家,拥有数学物理学博士学位——物质和信息的可互换性是这本书的一个主要特征,它发生在“后奇点”的未来。该系列的第一本书于2010年出版,其续集《分形王子》和《因果天使》分别于2012年和2014年出版。



A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game ''[[Zero Escape: Virtue's Last Reward]]'' and a minor part in its 2016 sequel ''[[Zero Escape: Zero Time Dilemma]]''.

A game modeled after the (iterated) prisoner's dilemma is a central focus of the 2012 video game Zero Escape: Virtue's Last Reward and a minor part in its 2016 sequel Zero Escape: Zero Time Dilemma.

一个模仿囚徒困境的游戏是2012年电子游戏《零度逃脱: 美德的最后奖励》的中心焦点,也是2016年续集《零度逃脱: 极限脱出刻之困境》的一个次要部分。



In ''The Mysterious Benedict Society and the Prisoner's Dilemma'' by [[Trenton Lee Stewart]], the main characters start by playing a version of the game and escaping from the "prison" altogether. Later they become actual prisoners and escape once again.

In The Mysterious Benedict Society and the Prisoner's Dilemma by Trenton Lee Stewart, the main characters start by playing a version of the game and escaping from the "prison" altogether. Later they become actual prisoners and escape once again.

在特伦顿 · 李 · 斯图尔特(Trenton Lee Stewart)的《神秘的本尼迪克特社会和囚徒困境》(The Mysterious Benedict Society and The Prisoner’s Dilemma)中,主要角色从玩一个版本的游戏开始,然后一起逃离“监狱”。后来他们变成了真正的囚犯,再次越狱。



In ''[[The Adventure Zone]]: Balance'' during ''The Suffering Game'' subarc, the player characters are twice presented with the prisoner's dilemma during their time in two liches' domain, once cooperating and once defecting.

In The Adventure Zone: Balance during The Suffering Game subarc, the player characters are twice presented with the prisoner's dilemma during their time in two liches' domain, once cooperating and once defecting.

在冒险区: 苦难游戏中的平衡中,玩家角色在他们在两个领域的时间内两次被呈现囚徒困境,一次是合作,一次是叛逃。



In the 8th novel from the author James S. A. Corey [[Tiamat's Wrath]] . Winston Duarte explains the prisoners dilemma in his 14-year-old daughter, Teresa, to train her in strategic thinking. {{cn|date=April 2020}}

In the 8th novel from the author James S. A. Corey Tiamat's Wrath . Winston Duarte explains the prisoners dilemma in his 14-year-old daughter, Teresa, to train her in strategic thinking.

在作者詹姆斯 · s · a · 科里 · 提亚玛特的《愤怒》的第八部小说中。温斯顿•杜阿尔特(Winston Duarte)解释了他14岁的女儿特蕾莎(Teresa)面临的囚徒困境,以训练她的战略思维。



==See also==

{{div col|colwidth=18em}}

* [[Abilene paradox]]

* [[Centipede game]]

* [[Christmas truce]]

* [[Folk theorem (game theory)]]

* [[Free-rider problem]]

* [[Hobbesian trap]]

* [[Innocent prisoner's dilemma]]

* [[Liar Game]]

* [[Optional prisoner's dilemma]]

* [[Robert H. Frank#Prisoner's dilemma and cooperation|Prisoner's dilemma and cooperation]]

* [[Public goods game]]

* [[Gift-exchange game]]

* [[Reciprocal altruism]]

* [[Social preferences]]

* [[Swift trust theory]]

* [[Unscrupulous diner's dilemma]]

{{div col end}}



==References==

{{notelist}}

{{reflist|colwidth=30em}}



==Further reading==

{{refbegin|30em}}

* [[S.M. Amadae|Amadae, S.]] (2016). 'Prisoner's Dilemma,' ''Prisoners of Reason.'' [[Cambridge University Press]], NY, pp.&nbsp;24–61.

* {{cite book |first1=Robert |last1=Aumann |authorlink=Robert Aumann |chapter=Acceptable points in general cooperative ''n''-person games |editor1-first=R. D. |editor1-last=Luce |editor2-first=A. W. |editor2-last=Tucker |title=Contributions to the Theory 23 of Games IV |series=Annals of Mathematics Study |volume=40 |pages=287–324 |publisher=Princeton University Press |location=Princeton NJ |year=1959 |mr=0104521}}

* [[Robert Axelrod|Axelrod, R.]] (1984). ''[[The Evolution of Cooperation]]''. {{isbn|0-465-02121-2}}

* [[Cristina Bicchieri|Bicchieri, Cristina]] (1993). Rationality and Coordination. [[Cambridge University Press]].

* {{cite journal |first1=David M. |last1=Chess |date=December 1988 |title=Simulating the evolution of behavior: the iterated prisoners' dilemma problem |url=http://www.complex-systems.com/pdf/02-6-4.pdf |journal=Complex Systems |volume=2 |issue=6 |pages=663–70}}

* [[Melvin Dresher|Dresher, M.]] (1961). ''The Mathematics of Games of Strategy: Theory and Applications'' [[Prentice-Hall]], Englewood Cliffs, NJ.

* Greif, A. (2006). ''Institutions and the Path to the Modern Economy: Lessons from Medieval Trade.'' Cambridge University Press, [[Cambridge]], UK.

* [[Anatol Rapoport|Rapoport, Anatol]] and Albert M. Chammah (1965). ''Prisoner's Dilemma''. [[University of Michigan Press]].

{{refend}}



==External links==

*{{Commonscat-inline}}

* [http://plato.stanford.edu/entries/prisoner-dilemma/ Prisoner's Dilemma (''Stanford Encyclopedia of Philosophy'')]

* [http://www.msri.org/ext/larryg/pages/15.htm The Bowerbird's Dilemma] The Prisoner's Dilemma in ornithology&nbsp;– mathematical cartoon by Larry Gonick.

* [https://www.youtube.com/watch?v=_1SEXTVsxjk The Prisoner's Dilemma] The Prisoner's Dilemma with Lego minifigures.

* {{cite encyclopedia |last1=Dixit |first1=Avinash |authorlink1=Avinash Dixit |last2= Nalebuff |first2=Barry |authorlink2=Barry Nalebuff |editor=[[David R. Henderson]]|encyclopedia=[[Concise Encyclopedia of Economics]] |title=Prisoner's Dilemma |url=http://www.econlib.org/library/Enc/PrisonersDilemma.html |year=2008 |edition= 2nd |publisher=[[Library of Economics and Liberty]] |location=Indianapolis |isbn=978-0865976658 |oclc=237794267}}

* [http://gametheory101.com/The_Prisoner_s_Dilemma.html Game Theory 101: Prisoner's Dilemma]

* [https://www.youtube.com/watch?v=I71mjZefg8g Dawkins: Nice Guys Finish First]

* [https://axelrod.readthedocs.io/en/stable/ Axelrod] Iterated Prisoner's Dilemma [[Python (programming language)|Python]] library

* [http://gametheorygames.nl/index.html Play the Iterated Prisoner's Dilemma on gametheorygames.nl]

* [https://web.archive.org/web/20141011014608/http://demo.otree.org/demo/Prisoner%27s+Dilemma/ Play Prisoner's Dilemma on ''oTree''] (N/A 11-5-17)

* Nicky Case's [https://web.archive.org/web/20181229222135/https://ncase.me/trust/ Evolution of Trust], an example of the donation game

* [http://iterated-prisoners-dilemma.info Iterated Prisoner's Dilemma online game] by Wayne Davis

{{Decision theory paradoxes}}

{{Game theory}}



{{Authority control}}



[[Category:Non-cooperative games]]

Category:Non-cooperative games

类别: 非合作性游戏

[[Category:Thought experiments]]

Category:Thought experiments

类别: 思维实验

[[Category:Dilemmas]]

Category:Dilemmas

类别: 困境

[[Category:Environmental studies]]

Category:Environmental studies

类别: 环境研究

[[Category:Social psychology]]

Category:Social psychology

类别: 社会心理学

[[Category:Moral psychology]]

Category:Moral psychology

范畴: 道德心理学

<noinclude>

<small>This page was moved from [[wikipedia:en:Prisoner's dilemma]]. Its edit history can be viewed at [[囚徒困境/edithistory]]</small></noinclude>

[[Category:待整理页面]]
1,569

个编辑

导航菜单