AIXI模型

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索

此词条暂由彩云小译翻译,翻译字数共1519,未经人工整理和审校,带来阅读不便,请见谅。

AIXI 模板:IPA-all is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000[1] and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.[2]

AIXI is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000 and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.

AIXI 是人工一般智能的理论数学形式主义。它将所罗门诺夫归纳法与序贯决策理论相结合。AIXI 是由 Marcus Hutter 于2000年首次提出的,关于 AIXI 的一些结果在 Hutter 于2005年出版的《通用人工智能》一书中得到了证实。

AIXI is a reinforcement learning agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

AIXI is a reinforcement learning agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

AIXI 是强化学习的代理商。它最大限度地实现了从环境中获得的预期总回报。直观上,它同时考虑每个可计算的假设(或环境)。在每一个时间步骤中,它查看每一个可能的计划,并根据下一步行动评估该计划产生的奖励数量。然后,承诺的奖励是由这个项目构成真实环境的主观信念加权的。这种看法是根据程序的长度计算出来的: 较长的程序被认为是不太可能的,这与 Occam 的剃刀理论一致。然后 AIXI 选择在所有这些方案的加权和中期望总回报最高的动作。

Definition

Definition

= 定义 =

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment [math]\displaystyle{ \mu }[/math]. The interaction proceeds in time steps, from [math]\displaystyle{ t=1 }[/math] to [math]\displaystyle{ t=m }[/math], where [math]\displaystyle{ m \in \mathbb{N} }[/math] is the lifespan of the AIXI agent. At time step t, the agent chooses an action [math]\displaystyle{ a_t \in \mathcal{A} }[/math] (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" [math]\displaystyle{ e_t \in \mathcal{E} = \mathcal{O} \times \mathbb{R} }[/math], which consists of an "observation" [math]\displaystyle{ o_t \in \mathcal{O} }[/math] (e.g., a camera image) and a reward [math]\displaystyle{ r_t \in \mathbb{R} }[/math], distributed according to the conditional probability [math]\displaystyle{ \mu(o_t r_t | a_1 o_1 r_1 ... a_{t-1} o_{t-1} r_{t-1} a_t) }[/math], where [math]\displaystyle{ a_1 o_1 r_1 ... a_{t-1} o_{t-1} r_{t-1} a_t }[/math] is the "history" of actions, observations and rewards. The environment [math]\displaystyle{ \mu }[/math] is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that [math]\displaystyle{ \mu }[/math] is computable, that is, the observations and rewards received by the agent from the environment [math]\displaystyle{ \mu }[/math] can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.[3]

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment \mu. The interaction proceeds in time steps, from t=1 to t=m, where m \in \mathbb{N} is the lifespan of the AIXI agent. At time step t, the agent chooses an action a_t \in \mathcal{A} (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept" e_t \in \mathcal{E} = \mathcal{O} \times \mathbb{R}, which consists of an "observation" o_t \in \mathcal{O} (e.g., a camera image) and a reward r_t \in \mathbb{R}, distributed according to the conditional probability \mu(o_t r_t | a_1 o_1 r_1 ... a_{t-1} o_{t-1} r_{t-1} a_t), where a_1 o_1 r_1 ... a_{t-1} o_{t-1} r_{t-1} a_t is the "history" of actions, observations and rewards. The environment \mu is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that \mu is computable, that is, the observations and rewards received by the agent from the environment \mu can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.

AIXI 是一个强化学习的代理人,与一些随机和未知但可计算的环境互动。交互按照时间步骤进行,从 t = 1到 t = m,其中 mathbb { n }中的 m 是 AIXI 代理的生命周期。在时间步骤 t 中,代理在数学{ a }中选择一个动作 a _ t (例如:。肢体运动) ,并在环境中执行,而环境在数学{ e } = cal { o }乘以数学{ r }中响应一个“知觉”e_t,它包括数学中的“观察”和数学中的“奖赏”,按照条件概率 mu (o t r t | a 1 o 1 r 1... a { t-1} r { t-1} a t)分布,其中 a 1 o 1 r 1。. a _ { t-1} o _ { t-1} r _ { t-1} a _ t 是行动、观察和奖励的“历史”。因此,mu 环境在数学上表示为一个概率分布,而不是依赖于完整历史的“感知”(观察和奖励) ,所以没有马尔可夫假设(相对于其他 RL 算法)。再次注意,AIXI 代理不知道这个概率分布。此外,请再次注意,mu 是可计算的,也就是说,代理人从环境 mu 收到的观察值和奖励可以通过一些程序(运行在图灵机上)计算,给出了 AIXI 代理人过去的行为。

The only goal of the AIXI agent is to maximise [math]\displaystyle{ \sum_{t=1}^m r_t }[/math], that is, the sum of rewards from time step 1 to m.

The only goal of the AIXI agent is to maximise \sum_{t=1}^m r_t, that is, the sum of rewards from time step 1 to m.

AIXI 代理的唯一目标是最大化 sum { t = 1} ^ m r _ t,即从时间步骤1到 m 的奖励总和。

The AIXI agent is associated with a stochastic policy [math]\displaystyle{ \pi : (\mathcal{A} \times \mathcal{E})^* \rightarrow \mathcal{A} }[/math], which is the function it uses to choose actions at every time step, where [math]\displaystyle{ \mathcal{A} }[/math] is the space of all possible actions that AIXI can take and [math]\displaystyle{ \mathcal{E} }[/math] is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) [math]\displaystyle{ \mu }[/math] can also be thought of as a stochastic policy (which is a function): [math]\displaystyle{ \mu : (\mathcal{A} \times \mathcal{E})^* \times \mathcal{A} \rightarrow \mathcal{E} }[/math], where the [math]\displaystyle{ * }[/math] is the Kleene star operation.

The AIXI agent is associated with a stochastic policy \pi : (\mathcal{A} \times \mathcal{E})^* \rightarrow \mathcal{A}, which is the function it uses to choose actions at every time step, where \mathcal{A} is the space of all possible actions that AIXI can take and \mathcal{E} is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution) \mu can also be thought of as a stochastic policy (which is a function): \mu : (\mathcal{A} \times \mathcal{E})^* \times \mathcal{A} \rightarrow \mathcal{E} , where the * is the Kleene star operation.

AIXI 代理与一个随机策略 pi 关联: (mathcal { a } times mathcal { e }) ^ * righttarrow mathcal { a } ,这是它用来在每个时间步骤中选择操作的函数,其中数学{ a }是 AIXI 可以执行的所有可能操作的空间,数学{ e }是环境可以产生的所有可能“感知器”的空间。环境(或者概率分布) mu 也可以被认为是一个随机策略(一个函数) : mu: (mathcal { a } times cal { e }) ^ * times mathcal { a } right tarrow mathcal { e } ,其中 * 是 Kleene 星运算。

In general, at time step [math]\displaystyle{ t }[/math] (which ranges from 1 to m), AIXI, having previously executed actions [math]\displaystyle{ a_1\dots a_{t-1} }[/math] (which is often abbreviated in the literature as [math]\displaystyle{ a_{\lt t} }[/math]) and having observed the history of percepts [math]\displaystyle{ o_1 r_1 ... o_{t-1} r_{t-1} }[/math] (which can be abbreviated as [math]\displaystyle{ e_{\lt t} }[/math]), chooses and executes in the environment the action, [math]\displaystyle{ a_t }[/math], defined as follows [4]

In general, at time step t (which ranges from 1 to m), AIXI, having previously executed actions a_1\dots a_{t-1} (which is often abbreviated in the literature as a_{<t}) and having observed the history of percepts o_1 r_1 ... o_{t-1} r_{t-1} (which can be abbreviated as e_{<t}), chooses and executes in the environment the action, a_t, defined as follows Universal Artificial Intelligence

一般来说,在时间步骤 t (范围从1到 m)时,AIXI 在以前执行过动作 a _ 1点 a _ { t-1}(文献中通常缩写为 a _ { < t })并观察过感知器 o _ 1 r _ 1... o _ { t-1} r _ { t-1}(可缩写为 e _ { < t })的历史后,在环境中选择并执行该动作 a _ t,定义如下: 通用人工智能

[math]\displaystyle{ a_t := \arg \max_{a_t} \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} }[/math]

a_t := \arg \max_{a_t} \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)}


a_t := \arg \max_{a_t} \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)}

or, using parentheses, to disambiguate the precedences

or, using parentheses, to disambiguate the precedences

或者,用括号来消除优先关系的歧义

[math]\displaystyle{ a_t := \arg \max_{a_t} \left( \sum_{o_t r_t} \ldots \left( \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \left( \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} \right) \right) \right) }[/math]

a_t := \arg \max_{a_t} \left( \sum_{o_t r_t} \ldots \left( \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \left( \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} \right) \right) \right)


a_t := \arg \max_{a_t} \left( \sum_{o_t r_t} \ldots \left( \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \left( \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} \right) \right) \right)

Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to [math]\displaystyle{ m - t }[/math] time steps ahead (that is, from [math]\displaystyle{ t }[/math] to [math]\displaystyle{ m }[/math]), weighs each of them by the complexity of programs [math]\displaystyle{ q }[/math] (that is, by [math]\displaystyle{ 2^{-\textrm{length}(q)} }[/math]) consistent with the agent's past (that is, the previously executed actions, [math]\displaystyle{ a_{\lt t} }[/math], and received percepts, [math]\displaystyle{ e_{\lt t} }[/math]) that can generate that future, and then picks the action that maximises expected future rewards.[3]

Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to m - t time steps ahead (that is, from t to m), weighs each of them by the complexity of programs q (that is, by 2^{-\textrm{length}(q)}) consistent with the agent's past (that is, the previously executed actions, a_{<t}, and received percepts, e_{<t}) that can generate that future, and then picks the action that maximises expected future rewards.

直观地说,在上面的定义中,AIXI 考虑所有可能的“未来”(即从 t 到 m)的总回报的总和,根据程序 q (即由2 ^ {-textrm { length }(q)})的复杂性(与代理人的过去(即先前执行的操作 a { < t } ,以及接收到的感知 e { < t })来衡量每一个可能的“未来”,然后选择最大化预期未来回报的操作。

Let us break this definition down in order to attempt to fully understand it.

Let us break this definition down in order to attempt to fully understand it.

让我们打破这个定义,以便尝试充分理解它。

[math]\displaystyle{ o_t r_t }[/math] is the "percept" (which consists of the observation [math]\displaystyle{ o_t }[/math] and reward [math]\displaystyle{ r_t }[/math]) received by the AIXI agent at time step [math]\displaystyle{ t }[/math] from the environment (which is unknown and stochastic). Similarly, [math]\displaystyle{ o_m r_m }[/math] is the percept received by AIXI at time step [math]\displaystyle{ m }[/math] (the last time step where AIXI is active).

o_t r_t is the "percept" (which consists of the observation o_t and reward r_t) received by the AIXI agent at time step t from the environment (which is unknown and stochastic). Similarly, o_m r_m is the percept received by AIXI at time step m (the last time step where AIXI is active).

O _ t r _ t 是 AIXI 代理人在时间步 t 从环境(未知的、随机的)接收到的“知觉”(包括观察 o _ t 和奖励 r _ t)。类似地,o _ m r _ m 是 AIXI 在时间步 m (AIXI 处于活动状态的最后一个时间步)接收到的对象。

[math]\displaystyle{ r_t + \ldots + r_m }[/math] is the sum of rewards from time step [math]\displaystyle{ t }[/math] to time step [math]\displaystyle{ m }[/math], so AIXI needs to look into the future to choose its action at time step [math]\displaystyle{ t }[/math].

r_t + \ldots + r_m is the sum of rewards from time step t to time step m, so AIXI needs to look into the future to choose its action at time step t.

R _ t + ldots + r _ m 是从时间步 t 到时间步 m 的报酬之和,因此 AIXI 需要考虑未来来选择它在时间步 t 的行动。

[math]\displaystyle{ U }[/math] denotes a monotone universal Turing machine, and [math]\displaystyle{ q }[/math] ranges over all (deterministic) programs on the universal machine [math]\displaystyle{ U }[/math], which receives as input the program [math]\displaystyle{ q }[/math] and the sequence of actions [math]\displaystyle{ a_1\dots a_m }[/math] (that is, all actions), and produces the sequence of percepts [math]\displaystyle{ o_1 r_1 \ldots o_m r_m }[/math]. The universal Turing machine [math]\displaystyle{ U }[/math] is thus used to "simulate" or compute the environment responses or percepts, given the program [math]\displaystyle{ q }[/math] (which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the current and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.

U denotes a monotone universal Turing machine, and q ranges over all (deterministic) programs on the universal machine U, which receives as input the program q and the sequence of actions a_1\dots a_m (that is, all actions), and produces the sequence of percepts o_1 r_1 \ldots o_m r_m. The universal Turing machine U is thus used to "simulate" or compute the environment responses or percepts, given the program q (which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the current and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.

表示一个单调的通用图灵机,q 值范围包括通用机器 u 上的所有确定性程序,u 接收程序 q 和动作序列 a _ 1 dots a _ m (即所有动作) ,并生成感知器 o _ 1 r _ 1 ldots o _ m r _ m 的序列。因此,通用图灵机 u 被用来“模拟”或计算环境响应或感知,给定程序 q (模型化环境)和 AIXI 代理的所有动作: 在这个意义上,环境是“可计算的”(如上所述)。注意,一般来说,对当前和实际环境(AIXI 需要采取行动的地方)进行“建模”的程序是未知的,因为当前环境也是未知的。

[math]\displaystyle{ \textrm{length}(q) }[/math] is the length of the program [math]\displaystyle{ q }[/math] (which is encoded as a string of bits). Note that [math]\displaystyle{ 2^{-\textrm{length}(q)} = \frac{1}{2^{\textrm{length}(q)}} }[/math]. Hence, in the definition above, [math]\displaystyle{ \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} }[/math] should be interpreted as a mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity [math]\displaystyle{ 2^{-\textrm{length}(q)} }[/math]. Note that [math]\displaystyle{ a_1 \ldots a_m }[/math] can also be written as [math]\displaystyle{ a_1 \ldots a_{t-1}a_t \ldots a_m }[/math], and [math]\displaystyle{ a_1 \ldots a_{t-1} = a_{\lt t} }[/math] is the sequence of actions already executed in the environment by the AIXI agent. Similarly, [math]\displaystyle{ o_1 r_1 \ldots o_m r_m = o_1 r_1 \ldots o_{t-1} r_{t-1}o_{t} r_{t} \ldots o_m r_m }[/math], and [math]\displaystyle{ o_1 r_1 \ldots o_{t-1} r_{t-1} }[/math] is the sequence of percepts produced by the environment so far.

\textrm{length}(q) is the length of the program q (which is encoded as a string of bits). Note that 2^{-\textrm{length}(q)} = \frac{1}{2^{\textrm{length}(q)}}. Hence, in the definition above, \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} should be interpreted as a mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity 2^{-\textrm{length}(q)}. Note that a_1 \ldots a_m can also be written as a_1 \ldots a_{t-1}a_t \ldots a_m, and a_1 \ldots a_{t-1} = a_{<t} is the sequence of actions already executed in the environment by the AIXI agent. Similarly, o_1 r_1 \ldots o_m r_m = o_1 r_1 \ldots o_{t-1} r_{t-1}o_{t} r_{t} \ldots o_m r_m, and o_1 r_1 \ldots o_{t-1} r_{t-1} is the sequence of percepts produced by the environment so far.

Textrm { length }(q)是程序 q 的长度(被编码为位的字符串)。注意2 ^ {-textrm { length }(q)} = frac {1}{2 ^ { textrm { length }(q)}}。因此,在上面的定义中,sum _ { q: ; u (q,a _ 1 ldots a _ m) = o _ 1 r _ 1 ldots o _ m _ m }2 ^ {-textrm { length }(q)}应该被解释为在所有可计算环境中的混合物(在这种情况下,是一个和)(这与代理的过去一致) ,每个环境的复杂度为2 ^ {-textrm { length }(q)}。请注意,a _ 1 ldots a _ { t-1} a _ t dots a _ m 也可以写成 a _ { t-1} a _ t dots a _ m,a _ 1 ldots a _ { t-1} = a _ { < t }是 AIXI 代理已经在环境中执行的操作序列。类似地,o _ 1 r _ 1 ldots o _ m r _ m = o _ 1 r _ 1 ldots o _ { t-1} r _ { t-1} o _ { t-1} r _ m r _ m,o _ 1 r _ 1 ldots o _ { t-1}是目前为止由环境产生的感知序列。

Let us now put all these components together in order to understand this equation or definition.

Let us now put all these components together in order to understand this equation or definition.

现在让我们把所有这些组成部分放在一起,以便理解这个等式或定义。

At time step t, AIXI chooses the action [math]\displaystyle{ a_t }[/math] where the function [math]\displaystyle{ \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} }[/math] attains its maximum.

At time step t, AIXI chooses the action a_t where the function \sum_{o_t r_t} \ldots \max_{a_m} \sum_{o_m r_m} [r_t + \ldots + r_m] \sum_{q:\; U(q, a_1 \ldots a_m) = o_1 r_1 \ldots o_m r_m} 2^{-\textrm{length}(q)} attains its maximum.

在时间步长 t,AIXI 选择函数 sum { o _ t r _ t } ldots max { a _ m }[ r _ t + ldots + r _ m ] sum { q: ; u (q,a _ 1 ldots a _ m) = o _ 1 r1 ldots o _ m }2 ^ {-textrm { length }(q)}达到最大值的动作 a _ t。

模板:Missing information

Parameters

Parameters

= = 参数 = =

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.

AIXI 的参数是通用图灵机 u 和代理的生存期 m,这两个参数需要选择。后一个参数可以通过折现来消除。

The meaning of the word AIXI

The meaning of the word AIXI

= 单词 AIXI 的意思 =

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by [math]\displaystyle{ \xi }[/math] (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by \xi (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.

根据 Hutter 的说法,“ AIXI”这个词可以有几种解释。AIXI 可以表示 AI,基于 Solomonoff 的分布,由 xi (希腊字母 xi)表示,或者。它可以代表 AI 的“交叉”(x)与归纳(i)。还有其他的解释。

Optimality

Optimality

= 最优化 =

AIXI's performance is measured by the expected total number of rewards it receives. AIXI has been proven to be optimal in the following ways.[2]

AIXI's performance is measured by the expected total number of rewards it receives. AIXI has been proven to be optimal in the following ways.

AIXI 的表现是通过预期的总奖励数量来衡量的。AIXI 已被证明在以下方面是最佳的。

  • Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.[citation needed]
  • Balanced Pareto optimality: Like Pareto optimality, but considering a weighted sum of environments.
  • Self-optimizing: a policy p is called self-optimizing for an environment [math]\displaystyle{ \mu }[/math] if the performance of p approaches the theoretical maximum for [math]\displaystyle{ \mu }[/math] when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.
  • Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.
  • Balanced Pareto optimality: Like Pareto optimality, but considering a weighted sum of environments.
  • Self-optimizing: a policy p is called self-optimizing for an environment \mu if the performance of p approaches the theoretical maximum for \mu when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.


  • 帕累托最优: 没有其他代理在所有环境中的性能至少比 AIXI 好,而在至少一个环境中的性能更好。
  • 平衡的帕累托最优: 就像帕累托最优一样,但是考虑环境的加权和。
  • 自寻优: 当代理人的生命周期(而非时间)无限长时,如果 p 的性能接近 mu 的理论最大值,则称策略 p 为环境 mu 的自寻优。对于存在自优化策略的环境类,AIXI 是自优化的。

It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.[5]

It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.

后来 Hutter 和 Jan Leike 证明了平衡的帕累托最优是主观的,任何政策都可以被认为是帕累托最优的,他们将其描述为破坏了以前所有关于 AIXI 的最优主张。

However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.[6]

However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.

然而,AIXI 确实有其局限性。它仅限于基于感知而不是外部状态来最大化报酬。它还假定它只通过作用和感知通道与环境相互作用,从而阻止它考虑被破坏或修改的可能性。通俗地说,这意味着它不认为自己被与之交互的环境所包含。它还假设环境是可计算的。

Computational aspects

Computational aspects

= 计算方面 =

Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent.[2] Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.[3][7]

Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent. Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.Playing Pacman using AIXI Approximation - YouTube

就像所罗门诺夫归纳法一样,AIXI 是不可计算的。然而,有一些可计算的近似值。其中一个近似是 AIXItl,它的性能至少和可证明的最佳时间 t 和空间 l 有限的 agent 一样好。另一个类似 AIXI 的受限环境类是 mc-AIXI (FAC-CTW)(代表 Monte Carlo AIXI FAC-Context-Tree Weighting) ,它已经成功地玩了一些简单的游戏,比如部分可观察的 Pac-Man。使用 AIXI Approximation 播放 Pacman-YouTube

See also

  • Gödel machine

= = =

  • 哥德尔机器

References

References

= 参考文献 =

  1. Marcus Hutter (2000). A Theory of Universal Artificial Intelligence based on Algorithmic Complexity. arXiv:cs.AI/0004001. Bibcode 2000cs........4001H. https://archive.org/details/arxiv-cs0004001. 
  2. 2.0 2.1 2.2 — (2004). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 978-3-540-22139-5. https://books.google.com/books?id=NP53iZGt4KUC. 
  3. 3.0 3.1 3.2 Veness, Joel; Kee Siong Ng; Hutter, Marcus; Uther, William; Silver, David (2009). "A Monte Carlo AIXI Approximation". arXiv:0909.0801 [cs.AI].
  4. Universal Artificial Intelligence
  5. Leike, Jan; Hutter, Marcus (2015). Bad Universal Priors and Notions of Optimality (PDF). Proceedings of the 28th Conference on Learning Theory.
  6. Soares, Nate. "Formalizing Two Problems of Realistic World-Models" (PDF). Intelligence.org. Retrieved 2015-07-19.
  7. Playing Pacman using AIXI Approximation - YouTube
  • "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, 模板:ArXiv; also in Artificial General Intelligence, eds. B. Goertzel and C. Pennachin, Springer, 2007, , pp. 227–290, doi:10.1007/978-3-540-68677-4_8.


    • "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, ; also in Artificial General Intelligence, eds. B. Goertzel and C. Pennachin, Springer, 2007, , pp. 227–290, .


    • 「通用算法智能: 数学自上而下的方法」 ,Marcus Hutter,,,同为《人工通用智能》编辑。2007,,pp.227–290, .

    Category:Optimal decisions Category:Decision theory Category:Machine learning

    类别: 最佳决策类别: 决策理论类别: 机器学习


    This page was moved from wikipedia:en:AIXI. Its edit history can be viewed at AIXI模型/edithistory