更改

跳到导航 跳到搜索
添加1,310字节 、 2021年3月20日 (六) 18:06
无编辑摘要
第74行: 第74行:     
交叉算子在后续世代地研究中引入了大量地复杂性:在亲代中及其有效的个体(比如爱因斯坦)在后代中不会出现,因为该个体中地等位基因被分割开来,传到了不同地后代个体中。这就引发了一个重要的问题:如果不是亲代的染色体,那到底是什么在一代又一代中遗传着呢?回答这个问题,需要需要预测在一代又一代中扩散的等位基因团,还需要对费舍尔的基本理论做进一步的扩展和泛化。模式定理 schema theorem 就是这么一个泛化。模式定理把任意一个等位基因团成为“模式”。一个模式使用符号 * 来表示染色体中不属于该模式的位置。例如,如果一个位置中由两个不同的等位基因,分别为1和0,那么位置2为1,位置4为0,位置5为0的基因团就被表示成符号串*1*00**...*。用 N(s,t) 表示在第t代中携带模式s的染色体数目模式定理揭示了携带模式s的染色体在下一代中预期的数目<math>N(s,t+1)</math>。下面是一个简化后的版本:
 
交叉算子在后续世代地研究中引入了大量地复杂性:在亲代中及其有效的个体(比如爱因斯坦)在后代中不会出现,因为该个体中地等位基因被分割开来,传到了不同地后代个体中。这就引发了一个重要的问题:如果不是亲代的染色体,那到底是什么在一代又一代中遗传着呢?回答这个问题,需要需要预测在一代又一代中扩散的等位基因团,还需要对费舍尔的基本理论做进一步的扩展和泛化。模式定理 schema theorem 就是这么一个泛化。模式定理把任意一个等位基因团成为“模式”。一个模式使用符号 * 来表示染色体中不属于该模式的位置。例如,如果一个位置中由两个不同的等位基因,分别为1和0,那么位置2为1,位置4为0,位置5为0的基因团就被表示成符号串*1*00**...*。用 N(s,t) 表示在第t代中携带模式s的染色体数目模式定理揭示了携带模式s的染色体在下一代中预期的数目<math>N(s,t+1)</math>。下面是一个简化后的版本:
 +
    
<math>N(s,t+1) =  u(s,t)[1-e]N(s,t)</math>
 
<math>N(s,t+1) =  u(s,t)[1-e]N(s,t)</math>
 +
    
where <math>u(s,t)</math> is the average fitness of the chromosomes carrying schema <math>s</math> at time <math>t</math> (the observed average fitness), and <math>e</math> is the overall probability (usually quite small) that the cluster s will be destroyed (or created) by mutation or crossover.  (Note that <math>e</math> does become large if the alleles in the cluster are spread over a considerable part of the chromosome).  This formula for <math>N(s,t+1)</math> can be restated in terms of probabilities (proportions) <math>P(s,t)\ ,</math> a more typical form for mathematical genetics, by noting that <math>P(s,t) = N(s,t)/N(t)\ ,</math> were <math>N(t)</math> is the size of the population at time <math>t\ .</math>
 
where <math>u(s,t)</math> is the average fitness of the chromosomes carrying schema <math>s</math> at time <math>t</math> (the observed average fitness), and <math>e</math> is the overall probability (usually quite small) that the cluster s will be destroyed (or created) by mutation or crossover.  (Note that <math>e</math> does become large if the alleles in the cluster are spread over a considerable part of the chromosome).  This formula for <math>N(s,t+1)</math> can be restated in terms of probabilities (proportions) <math>P(s,t)\ ,</math> a more typical form for mathematical genetics, by noting that <math>P(s,t) = N(s,t)/N(t)\ ,</math> were <math>N(t)</math> is the size of the population at time <math>t\ .</math>
   −
其中 <math>u(s,t)</math>
+
其中 <math>u(s,t)</math> 指在第<math>t</math>代中携带模式<math>s</math>的染色体的平均适应度, <math>e</math> 指模式s会被变异算子和交叉算子破坏的概率(通常会很小,但如果模式中的等位基因在染色体中分布很广,即模式的长度很大,则<math>e</math>会很大)。上述公式可以被重写为一个概率值形式 <math>P(s,t)\ ,</math>,这是在数理遗传学中更加典型的形式:<math>P(s,t) = N(s,t)/N(t)\ ,</math> ,其中<math>N(t)</math>是群体在第<math>t\ .</math>代的规模。
       
The schema theorem shows that ''every'' cluster of alleles present in the population increases or decreases its proportion in the population at a rate determined by the observed average fitness of its carriers.  In particular, schemas consistently associated with strings of above-average fitness spread rapidly through the population.  As a result, crossover regularly tries out new combinations of these above-average schemas; they become building blocks for further attempts at solution. Though a GA only directly processes <math>N(t)</math> strings in a generation, it effectively processes the much larger number of schemata carried in the population.  For example, the number of schemas on a single string of length 40 is <math>2 * exp(40) \sim 16,000\ ,</math> so a GA processing a few dozen strings of length 40 effectively processes thousands of schemas.  For problems in which schemas capture regularities in the search space, this is a tremendous speedup.
 
The schema theorem shows that ''every'' cluster of alleles present in the population increases or decreases its proportion in the population at a rate determined by the observed average fitness of its carriers.  In particular, schemas consistently associated with strings of above-average fitness spread rapidly through the population.  As a result, crossover regularly tries out new combinations of these above-average schemas; they become building blocks for further attempts at solution. Though a GA only directly processes <math>N(t)</math> strings in a generation, it effectively processes the much larger number of schemata carried in the population.  For example, the number of schemas on a single string of length 40 is <math>2 * exp(40) \sim 16,000\ ,</math> so a GA processing a few dozen strings of length 40 effectively processes thousands of schemas.  For problems in which schemas capture regularities in the search space, this is a tremendous speedup.
 +
 +
模式定理说明了,群体中每一个等位基因团的比例都会提高活降低,期速率由它所能带来的平均适应度决定。特别地,能带来更高适应度的模式,会在群体中快速传播开来。结果就是,交叉算子会使用这些高于平均适应度的模式,尝试各种新的组合。这些模式成为了构建解的组件块 building blocks。尽管遗传算法每一代只处理<math>N(t)</math>个符号串,它能有效处理在群体中承载的多得多的模式。比如,一个长度为40的符号串中包含的模式的数量为<math>2 * exp(40) \sim 16,000\ ,</math>,因此遗传算法只要处理几十个长度为40的符号串,就能处理成千上万个模式。这是一个及其出色的速度。
     
370

个编辑

导航菜单