更改

囚徒困境 (查看源代码)

2020年9月26日 (六) 20:33的版本

添加305字节、 2020年9月26日 (六) 20:33

无编辑摘要

第640行：第640行：

In a stochastic iterated prisoner's dilemma game, strategies are specified by in terms of "cooperation probabilities". In an encounter between player X and player Y, X 's strategy is specified by a set of probabilities P of cooperating with Y. P is a function of the outcomes of their previous encounters or some subset thereof. If P is a function of only their most recent n encounters, it is called a "memory-n" strategy. A memory-1 strategy is then specified by four cooperation probabilities: <math>P=\{P_{cc},P_{cd},P_{dc},P_{dd}\}</math>, where <math>P_{ab}</math> is the probability that X will cooperate in the present encounter given that the previous encounter was characterized by (ab). For example, if the previous encounter was one in which X cooperated and Y defected, then <math>P_{cd}</math> is the probability that X will cooperate in the present encounter. If each of the probabilities are either 1 or 0, the strategy is called deterministic. An example of a deterministic strategy is the tit for tat strategy written as P={1,0,1,0}, in which X responds as Y did in the previous encounter. Another is the win–stay, lose–switch strategy written as P={1,0,0,1}, in which X responds as in the previous encounter, if it was a "win" (i.e. cc or dc) but changes strategy if it was a loss (i.e. cd or dd). It has been shown that for any memory-n strategy there is a corresponding memory-1 strategy which gives the same statistical results, so that only memory-1 strategies need be considered.

−

在随机迭代囚徒困境博弈中，策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇中，x 的策略由一组与 y 合作的概率 p 确定，p 是他们之前遭遇的结果的函数，或者是其中的一些子集。如果 p 只是它们最近遇到次数 n 的函数，那么它被称为“记忆-n”策略。我们可以用四个合作概率确定一个记忆-1策略: ~~math~~ p { cc }、 p { cd }、 p { dc }、 p { dd } / math，其中 math p { ab } / math 是 x 在前一次遭遇中合作的概率，而 x 在当前遭遇中合作的概率为拥有属性(ab)。例如，如果前一次遭遇中 x 合作而 y 叛变，那么数学 p { cd } / math 就是 x 在当前遭遇中合作的概率。如果每个概率都是1或0，这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略，写成 p {1,0,1,0} ，其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略，它被写成 p {1,0,0,1} ，在这种策略中，如果 x 获得胜利(即:cc 或 dc)，x会做出与上一次遭遇一样的反应，但如果失败，x会改变策略(即cd 或 dd)。研究表明，对于任何一种记忆-n 策略，存在一个相应的记忆-1策略，这个策略给出相同的统计结果，因此只需要考虑记忆-1策略。

+

在随机迭代囚徒困境博弈中，策略由“合作概率”来确定。在玩家 x 和玩家 y 之间的遭遇中，x 的策略由一组与 y 合作的概率 p 确定，p 是他们之前遭遇的结果的函数，或者是其中的一些子集。如果 p 只是它们最近遇到次数 n 的函数，那么它被称为“记忆-n”策略。我们可以用四个合作概率确定一个记忆-1策略:p { cc }、 p { cd }、 p { dc }、 p { dd } ，其中 math遭遇中合作的概率。如果每个概率都是1或0，这种策略称为确定性策略。确定性策略的一个例子是以牙还牙策略，写成 p {1,0,1,0} ，其中 x 的反应和 y 在前一次遭遇中的反应一样。另一种是胜-保持-败-转换策略，它被写成 p {1,0,0,1} ，在这种策略中，如果 x 获得胜利(即:cc 或 dc)，x会做出与上一次遭遇一样的反应，但如果失败，x会改变策略(即cd 或 dd)。研究表明，对于任何一种记忆-n 策略，存在一个相应的记忆-1策略，这个策略给出相同的统计结果，因此只需要考虑记忆-1策略。

第648行：第648行：

If we define P as the above 4-element strategy vector of X and <math>Q=\{Q_{cc},Q_{cd},Q_{dc},Q_{dd}\}</math> as the 4-element strategy vector of Y, a transition matrix M may be defined for X whose ij th entry is the probability that the outcome of a particular encounter between X and Y will be j given that the previous encounter was i, where i and j are one of the four outcome indices: cc, cd, dc, or dd. For example, from X 's point of view, the probability that the outcome of the present encounter is cd given that the previous encounter was cd is equal to <math>M_{cd,cd}=P_{cd}(1-Q_{dc})</math>. (The indices for Q are from Y 's point of view: a cd outcome for X is a dc outcome for Y.) Under these definitions, the iterated prisoner's dilemma qualifies as a stochastic process and M is a stochastic matrix, allowing all of the theory of stochastic processes to be applied.

−

如果我们将 p 定义为 x 的上述4元策略向量，并将 ~~math~~ q { cc }、 q { cd }、 q { dc }、 q { dd } ~~/ math~~ 定义为 y 的4元策略向量，则对于 x 可以定义一个转移矩阵 m，该 x 的第 j 项是 x 和 y 之间特定遭遇的结果为 j 的概率，前一次遭遇为 i，其中 i 和 j 是 cc、 cd、 dc 或 dd 四个结果索引中的一个。例如，从 x 的角度来看，如果前一次遭遇的结果是 cd，那么这次遭遇的结果是 cd 的概率等于 m { cd，cd } p { cd }(1-Q { dc }) ~~/ math。~~(q 的指数是 y 的观点: x 的 cd 结果是 y 的 dc 结果)在这些定义下，重复的囚徒困境被定义为一个随机过程，m 是一个转移矩阵，允许所有的随机过程理论被应用。

+

如果我们将 p 定义为 x 的上述4元策略向量，并将 q { cc }、 q { cd }、 q { dc }、 q { dd } 定义为 y 的4元策略向量，则对于 x 可以定义一个转移矩阵 m，该 x 的第 j 项是 x 和 y 之间特定遭遇的结果为 j 的概率，前一次遭遇为 i，其中 i 和 j 是 cc、 cd、 dc 或 dd 四个结果索引中的一个。例如，从 x 的角度来看，如果前一次遭遇的结果是 cd，那么这次遭遇的结果是 cd 的概率等于 m { cd，cd } p { cd }(1-Q { dc }) 。(q 的指数是 y 的观点: x 的 cd 结果是 y 的 dc 结果)在这些定义下，重复的囚徒困境被定义为一个随机过程，m 是一个转移矩阵，允许所有的随机过程理论被应用。

第656行：第656行：

One result of stochastic theory is that there exists a stationary vector v for the matrix M such that <math>v\cdot M=v</math>. Without loss of generality, it may be specified that v is normalized so that the sum of its four components is unity. The ij th entry in <math>M^n</math> will give the probability that the outcome of an encounter between X and Y will be j given that the encounter n steps previous is i. In the limit as n approaches infinity, M will converge to a matrix with fixed values, giving the long-term probabilities of an encounter producing j which will be independent of i. In other words, the rows of <math>M^\infty</math> will be identical, giving the long-term equilibrium result probabilities of the iterated prisoners dilemma without the need to explicitly evaluate a large number of interactions. It can be seen that v is a stationary vector for <math>M^n</math> and particularly <math>M^\infty</math>, so that each row of <math>M^\infty</math> will be equal to v. Thus the stationary vector specifies the equilibrium outcome probabilities for X. Defining <math>S_x=\{R,S,T,P\}</math> and <math>S_y=\{R,T,S,P\}</math> as the short-term payoff vectors for the {cc,cd,dc,dd} outcomes (From X 's point of view), the equilibrium payoffs for X and Y can now be specified as <math>s_x=v\cdot S_x</math> and <math>s_y=v\cdot S_y</math>, allowing the two strategies P and Q to be compared for their long term payoffs.

−

随机理论的一个结果是，矩阵 m 存在一个平稳向量 v，使得矩阵 m 是一个平稳向量，并且不失一般性，我们可以指定 v 是标准化的，因此它的4个组成部分之和是单位。数学 m ^ n ~~/ math~~ 中的 ij 项给出了 x 和 y 相遇的结果是 j 的概率，前面相遇 n 步的概率是 i。当 n 趋于无穷时，m 收敛于一个具有固定值的矩阵，并给出了产生 j 的长期概率，j 与 i ~~无关。换句话说，数学~~ m 的无限次方的行将是相同的，给出了重复囚徒困境的长期平衡结果概率，而不需要明确地计算大量的相互作用。可以看出，v 是数学 m ^ n ~~/ math 特别是数学~~ m ^ infty ~~/ math~~ 的平稳向量，因此数学 m的无限次方的每一行都等于 v，因此平稳向量指定 x 的平衡结果概率。将 s，r，s，t，p 和 s y，r，t，s，p 定义为{ cc，cd，dc，dd }结果的短期收益向量(从 x 的角度来看) ，x 和 y ~~的均衡收益现在可以指定为数学~~ ，使得两种P、Q策略能比较他们的长期回报。

+

随机理论的一个结果是，矩阵 m 存在一个平稳向量 v，使得矩阵 m 是一个平稳向量，并且不失一般性，我们可以指定 v 是标准化的，因此它的4个组成部分之和是单位。数学 m ^ n 中的 ij 项给出了 x 和 y 相遇的结果是 j 的概率，前面相遇 n 步的概率是 i。当 n 趋于无穷时，m 收敛于一个具有固定值的矩阵，并给出了产生 j 的长期概率，j 与 i 无关。换句话说， m 的无限次方的行将是相同的，给出了重复囚徒困境的长期平衡结果概率，而不需要明确地计算大量的相互作用。可以看出，v 是数学 m ^ n 特别是 m ^ infty 的平稳向量，因此数学 m的无限次方的每一行都等于 v，因此平稳向量指定 x 的平衡结果概率。将 s，r，s，t，p 和 s y，r，t，s，p 定义为{ cc，cd，dc，dd }结果的短期收益向量(从 x 的角度来看) ，x 和 y 的均衡收益现在可以指定为s_x=v\cdot S_x和s_y=v\cdot S_y ，使得两种P、Q策略能比较他们的长期回报。

第734行：第734行：

Players cannot seem to coordinate mutual cooperation, thus often get locked into the inferior yet stable strategy of defection. In this way, iterated rounds facilitate the evolution of stable strategies. Iterated rounds often produce novel strategies, which have implications to complex social interaction. One such strategy is win-stay lose-shift. This strategy outperforms a simple Tit-For-Tat strategy – that is, if you can get away with cheating, repeat that behavior, however if you get caught, switch.

−

玩家似乎不能协调相互合作，因此常常陷入劣势但稳定的叛逃策略。通过这种方式，迭代轮可以促进稳定策略的进化。多轮循环往往产生新颖的策略，这对复杂的社会互动有影响。其中一个策略就是“赢-留-输”的转变。这个策略比一个简单的以牙还牙策略要好——也就是说，如果你能逃脱作弊的惩罚，重复这个行为，但是如果你被抓住了，就改变策略。

+

玩家似乎不能协调相互合作，因此常常陷入劣势但稳定的叛变策略。通过这种方式，迭代回合可以促进稳定策略的进化。多轮循环往往产生新颖的策略，这对复杂的社会互动有影响。其中一个策略就是“赢-保持-输”的转变。这个策略比一个简单的以牙还牙策略要好——也就是说，如果你能逃脱作弊的惩罚，重复这个行为，但是如果你被抓住了，就改变策略。

第742行：第742行：

The only problem of this tit-for-tat strategy is that they are vulnerable to signal error. The problem arises when one individual cheats in retaliation but the other interprets it as cheating. As a result of this, the second individual now cheats and then it starts a see-saw pattern of cheating in a chain reaction.

−

这种针锋相对策略的唯一问题是，它们很容易出现信号错误。当一个人在报复中作弊，而另一个人将其解释为欺骗时，问题就出现了。结果，第二个人现在作弊，然后它开始了一个连锁反应的作弊模式。

+

这种以牙还牙策略的唯一问题是，它们很容易出现信号错误。当一个人因报复而作弊，而另一个人将其单纯解释为欺骗时，问题就出现了。结果，第二个人现在作弊，然后在接下来的连锁反应中开始了反复交替的作弊模式。

第752行：第752行：

The prisoner setting may seem contrived, but there are in fact many examples in human interaction as well as interactions in nature that have the same payoff matrix. The prisoner's dilemma is therefore of interest to the social sciences such as economics, politics, and sociology, as well as to the biological sciences such as ethology and evolutionary biology. Many natural processes have been abstracted into models in which living beings are engaged in endless games of prisoner's dilemma. This wide applicability of the PD gives the game its substantial importance.

−

囚犯的设置看起来似乎是人为的，但实际上在人类交往以及自然界的交往中有许多具有相同收益矩阵的例子。囚徒困境是经济学、政治学、社会学等社会科学以及动物行为学、进化生物学等生物学研究的热点问题。许多自然过程都被抽象为生物进行无休止的囚徒困境博弈的模型。这种广泛的适用性，使游戏的实质性重要性。

+

囚犯的设置看起来似乎是人为的，但实际上在人类交往以及自然界的交互中有许多具有相同收益矩阵的例子。囚徒困境是经济学、政治学、社会学等社会科学以及动物行为学、进化生物学等生物学研究的热点问题。许多自然过程都被抽象为生物进行无休止的囚徒困境博弈的模型。这种广泛的适用性让博弈非常重要。

第762行：第762行：

In environmental studies, the PD is evident in crises such as global climate-change. It is argued all countries will benefit from a stable climate, but any single country is often hesitant to curb Carbon dioxide| emissions. The immediate benefit to any one country from maintaining current behavior is wrongly perceived to be greater than the purported eventual benefit to that country if all countries' behavior was changed, therefore explaining the impasse concerning climate-change in 2007.

−

在环境研究中，在诸如全球气候变化等危机中，这种差异显而易见。有人认为，所有国家都将从稳定的气候中受益，但是任何一个国家往往在遏制二氧化碳排放方面犹豫不决。人们错误地认为，如果所有国家的行为都改变，任何一个国家保持目前的行为所带来的直接好处都会大于所谓的最终好处，这就解释了2007年气候变化方面的僵局。

+

在环境研究中，在诸如全球气候变化等危机中，这种差异显而易见。有人认为，所有国家都将从稳定的气候中受益，但是任何一个国家往往都在遏制二氧化碳排放方面犹豫不决。人们错误地认为，如果所有国家的行为都改变，任何一个国家保持目前的行为所带来的直接好处都会大于所谓的最终好处，这就解释了2007年气候变化方面的僵局。

第770行：第770行：

An important difference between climate-change politics and the prisoner's dilemma is uncertainty; the extent and pace at which pollution can change climate is not known. The dilemma faced by government is therefore different from the prisoner's dilemma in that the payoffs of cooperation are unknown. This difference suggests that states will cooperate much less than in a real iterated prisoner's dilemma, so that the probability of avoiding a possible climate catastrophe is much smaller than that suggested by a game-theoretical analysis of the situation using a real iterated prisoner's dilemma.

−

气候变化政治与囚徒困境之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此，政府面临的困境不同于囚徒困境，因为合作的回报是未知的。这种差异表明，各国之间的合作远远少于真正重复的囚徒困境中的合作，因此避免可能发生的气候灾难的可能性远远小于使用真正重复的囚徒困境进行的博弈论分析所提出的可能性。

+

气候变化政治与囚徒困境之间的一个重要区别是不确定性; 污染对气候变化的影响程度和速度尚不清楚。因此，政府面临的困境不同于囚徒困境，因为合作的回报是未知的。这种差异表明，各国之间的合作远远少于真正的迭代囚徒困境中的合作，因此避免可能发生的气候灾难的可能性远远小于使用真正的迭代囚徒困境进行的博弈论分析所提出的可能性。

第788行：第788行：

Cooperative behavior of many animals can be understood as an example of the prisoner's dilemma. Often animals engage in long term partnerships, which can be more specifically modeled as iterated prisoner's dilemma. For example, guppies inspect predators cooperatively in groups, and they are thought to punish non-cooperative inspectors.

−

许多动物的合作行为可以被理解为囚徒困境的一个例子。通常动物会有长期的伙伴关系，这种关系可以更具体地模拟为重复的囚徒困境。例如，孔雀鱼成群结队地合作检查捕食者，它们被认为是在惩罚不合作的检查员。

+

许多动物的合作行为可以被理解为囚徒困境的一个例子。通常动物会有长期的伙伴关系，这种关系可以更具体地模拟为重复的囚徒困境。例如，孔雀鱼成群结队地合作监察捕食者，它们被认为是在惩罚不合作的监察者。

第799行：第799行：

* C/C: "Reward: I get blood on my unlucky nights, which saves me from starving. I have to give blood on my lucky nights, which doesn't cost me too much."

−

+

合作/合作：回报：我在不幸运的晚上得到了能让我果腹的血，那在幸运的晚上我也应该分出点血，那不会花费多少。

* D/C: "Temptation: You save my life on my poor night. But then I get the added benefit of not having to pay the slight cost of feeding you on my good night."

−

+

叛变/合作：诱惑：你在我的不幸的夜里救了我，但在我的幸运夜我不会给你血以让我活的更好。

* C/D: "Sucker's Payoff: I pay the cost of saving your life on my good night. But on my bad night you don't feed me and I run a real risk of starving to death."

−

+

合作/叛变：可怜者的回报：在我的幸运夜我救了你的命，但在我的不幸夜里你没有救我，我有饿死的风险。

* D/D: "Punishment: I don't have to pay the slight costs of feeding you on my good nights. But I run a real risk of starving on my poor nights."

−

+

叛变/叛变：惩罚：我在我的幸运夜里不必付出代价来救你，但我在我的不幸夜里有挨饿的风险。

−

===Psychology===

第814行：第813行：

In addiction research / behavioral economics, George Ainslie points out that addiction can be cast as an intertemporal PD problem between the present and future selves of the addict. In this case, defecting means relapsing, and it is easy to see that not defecting both today and in the future is by far the best outcome. The case where one abstains today but relapses in the future is the worst outcome – in some sense the discipline and self-sacrifice involved in abstaining today have been "wasted" because the future relapse means that the addict is right back where he started and will have to start over (which is quite demoralizing, and makes starting over more difficult). Relapsing today and tomorrow is a slightly "better" outcome, because while the addict is still addicted, they haven't put the effort in to trying to stop. The final case, where one engages in the addictive behavior today while abstaining "tomorrow" will be familiar to anyone who has struggled with an addiction. The problem here is that (as in other PDs) there is an obvious benefit to defecting "today", but tomorrow one will face the same PD, and the same obvious benefit will be present then, ultimately leading to an endless string of defections.

−

在成瘾研究 / 行为经济学中，乔治 · 安斯利指出，成瘾可以被描述为成瘾者现在和未来自我之间的跨期 PD 问题。在这种情况下，叛逃意味着反复，很容易看出，不在今天和未来叛逃是迄今为止最好的结果。如果一个人今天戒了，但在将来又复吸，这是最糟糕的结果——从某种意义上来说，今天戒瘾所包含的纪律和自我牺牲已经被“浪费”了，因为未来的复吸意味着瘾君子又回到了他开始的地方，将不得不重新开始(这相当令人沮丧，也使得重新开始更加困难)。今天和明天复发是一个稍微“更好”的结果，因为当瘾君子仍然上瘾时，他们没有努力去尝试停止。最后一种情况，一个人在今天进行成瘾行为，而在明天弃权，这对于任何一个与成瘾作斗争的人来说都是熟悉的。这里的问题是(~~和其他公共安全部门一样~~) ~~，背叛“今天”有一个明显的好处，但明天一个人将面临同样的公共安全问题，同样明显的好处将出现，最终导致一连串无休止的背叛。~~

+

在成瘾研究 / 行为经济学中，乔治·安斯利指出，成瘾可以被描述为成瘾者现在和未来自我之间的跨期囚徒困境问题。在这种情况下，叛变意味着反复，很容易看出，不在今天和未来叛变是迄今为止最好的结果。如果一个人今天戒了，但在将来又复吸，这是最糟糕的结果——从某种意义上来说，今天戒瘾所包含的纪律和自我牺牲已经被“浪费”了，因为未来的复吸意味着瘾君子又回到了他开始的地方，将不得不重新开始(这相当令人沮丧，也使得重新开始更加困难)。今天和明天复发是一个稍微“更好”的结果，因为当瘾君子仍然上瘾时，他们没有努力去尝试停止。最后一种情况，一个人在今天进行成瘾行为，而在明天弃权，这对于任何一个与成瘾作斗争的人来说都是熟悉的。这里的问题是(和其他囚徒困境一样) ，背叛“今天”有一个明显的好处，但明天一个人将面临同样的囚徒困境，同样明显的好处将出现，最终导致一连串无休止的叛变。

Henry

153

个编辑

更改

囚徒困境 (查看源代码)

2020年9月26日 (六) 20:33的版本

导航菜单

搜索