更改

跳到导航 跳到搜索
删除3,162字节 、 2020年8月9日 (日) 17:48
第823行: 第823行:  
In the traditional formulation of the mutual information,
 
In the traditional formulation of the mutual information,
   −
在互信息的传统表述中,
+
在互信息的传统表述中:
 
  −
 
            
:<math> \operatorname{I}(X;Y)  
 
:<math> \operatorname{I}(X;Y)  
  −
<math> \operatorname{I}(X;Y)
  −
  −
数学运营商名称{ i }(x; y)
      
  = \sum_{y \in Y} \sum_{x \in X} p(x, y) \log \frac{p(x, y)}{p(x)\,p(y)}, </math>
 
  = \sum_{y \in Y} \sum_{x \in X} p(x, y) \log \frac{p(x, y)}{p(x)\,p(y)}, </math>
  −
= \sum_{y \in Y} \sum_{x \in X} p(x, y) \log \frac{p(x, y)}{p(x)\,p(y)}, </math>
  −
  −
{{ y }{{ y }{ x,y) log { p (x,y)}{ p (x) ,p (y)} ,/ math
  −
  −
        第862行: 第850行:     
:<math> \operatorname{I}(X;Y)  
 
:<math> \operatorname{I}(X;Y)  
  −
<math> \operatorname{I}(X;Y)
  −
  −
数学运营商名称{ i }(x; y)
  −
  −
= \sum_{y \in Y} \sum_{x \in X} w(x,y) p(x,y) \log \frac{p(x,y)}{p(x)\,p(y)}, </math>
      
= \sum_{y \in Y} \sum_{x \in X} w(x,y) p(x,y) \log \frac{p(x,y)}{p(x)\,p(y)}, </math>
 
= \sum_{y \in Y} \sum_{x \in X} w(x,y) p(x,y) \log \frac{p(x,y)}{p(x)\,p(y)}, </math>
   −
(x,y) p (x,y) log frac { p (x,y)}{ p (x) ,p (y)} ,/ math
  −
  −
  −
  −
  −
  −
which places a weight <math>w(x,y)</math> on the probability of each variable value co-occurrence, <math>p(x,y)</math>. This allows that certain probabilities may carry more or less significance than others, thereby allowing the quantification of relevant ''holistic'' or ''[[Prägnanz]]'' factors.  In the above example, using larger relative weights for <math>w(1,1)</math>, <math>w(2,2)</math>, and <math>w(3,3)</math> would have the effect of assessing greater ''informativeness'' for the relation  <math>\{(1,1),(2,2),(3,3)\}</math> than for the relation <math>\{(1,3),(2,1),(3,2)\}</math>, which may be desirable in some cases of pattern recognition, and the like.  This weighted mutual information is a form of weighted KL-Divergence, which is known to take negative values for some inputs,<ref name="weighted-kl">{{cite journal | last1 = Kvålseth | first1 = T. O. | year = 1991 | title = The relative useful information measure: some comments | url = | journal = Information Sciences | volume = 56 | issue = 1| pages = 35–38 | doi=10.1016/0020-0255(91)90022-m}}</ref> and there are examples where the weighted mutual information also takes negative values.<ref>{{cite dissertation
     −
which places a weight <math>w(x,y)</math> on the probability of each variable value co-occurrence, <math>p(x,y)</math>. This allows that certain probabilities may carry more or less significance than others, thereby allowing the quantification of relevant holistic or Prägnanz factors.  In the above example, using larger relative weights for <math>w(1,1)</math>, <math>w(2,2)</math>, and <math>w(3,3)</math> would have the effect of assessing greater informativeness for the relation  <math>\{(1,1),(2,2),(3,3)\}</math> than for the relation <math>\{(1,3),(2,1),(3,2)\}</math>, which may be desirable in some cases of pattern recognition, and the like.  This weighted mutual information is a form of weighted KL-Divergence, which is known to take negative values for some inputs, and there are examples where the weighted mutual information also takes negative values.<ref>{{cite dissertation
     −
将权重 math w (x,y) / math 放在每个变量值共现的概率上,math p (x,y) / math。这使得某些概率可能比其他概率具有更多或更少的重要性,从而允许量化相关的整体因子或 pr gnaanz 因子。在上面的例子中,对数学 w (1,1) / math,math w (2,2) / math,和 math w (3,3) / math 使用较大的相对权重,对关系数学 w (1,1) (2,2) (3,3) / math 比对关系数学 w (1,3) (2,1) (3,2) / math 有更大的信息量,这在某些模式识别的情况下是可取的,等等。这种加权互信息是加权的 kl 散度的一种形式,已知它对某些输入取负值,有些例子中加权互信息也取负值。 博士论文
+
which places a weight <math>w(x,y)</math> on the probability of each variable value co-occurrence, <math>p(x,y)</math>. This allows that certain probabilities may carry more or less significance than others, thereby allowing the quantification of relevant ''holistic'' or ''[[Prägnanz]]'' factors.  In the above example, using larger relative weights for <math>w(1,1)</math>, <math>w(2,2)</math>, and <math>w(3,3)</math> would have the effect of assessing greater ''informativeness'' for the relation  <math>\{(1,1),(2,2),(3,3)\}</math> than for the relation <math>\{(1,3),(2,1),(3,2)\}</math>, which may be desirable in some cases of pattern recognition, and the like.  This weighted mutual information is a form of weighted KL-Divergence, which is known to take negative values for some inputs,<ref name="weighted-kl">{{cite journal | last1 = Kvålseth | first1 = T. O. | year = 1991 | title = The relative useful information measure: some comments | url = | journal = Information Sciences | volume = 56 | issue = 1| pages = 35–38 | doi=10.1016/0020-0255(91)90022-m}}</ref> and there are examples where the weighted mutual information also takes negative values.<ref>{{cite dissertation|title=Feature Selection Via Joint Likelihood|first=A. |last=Pocock|year=2012|url=http://www.cs.man.ac.uk/~gbrown/publications/pocockPhDthesis.pdf}}</ref>
   −
  |title=Feature Selection Via Joint Likelihood
     −
  |title=Feature Selection Via Joint Likelihood
     −
通过联合似然选择特征
+
which places a weight <math>w(x,y)</math> on the probability of each variable value co-occurrence, <math>p(x,y)</math>. This allows that certain probabilities may carry more or less significance than others, thereby allowing the quantification of relevant holistic or Prägnanz factors. In the above example, using larger relative weights for <math>w(1,1)</math>, <math>w(2,2)</math>, and <math>w(3,3)</math> would have the effect of assessing greater informativeness for the relation  <math>\{(1,1),(2,2),(3,3)\}</math> than for the relation <math>\{(1,3),(2,1),(3,2)\}</math>, which may be desirable in some cases of pattern recognition, and the like.  This weighted mutual information is a form of weighted KL-Divergence, which is known to take negative values for some inputs, and there are examples where the weighted mutual information also takes negative values.
 
  −
  |first=A. |last=Pocock
  −
 
  −
  |first=A. |last=Pocock
  −
 
  −
第一个 a。最后的波科克
  −
 
  −
  |year=2012
  −
 
  −
  |year=2012
  −
 
  −
2012年
  −
 
  −
  |url=http://www.cs.man.ac.uk/~gbrown/publications/pocockPhDthesis.pdf
  −
 
  −
  |url=http://www.cs.man.ac.uk/~gbrown/publications/pocockPhDthesis.pdf
  −
 
  −
Http://www.cs.man.ac.uk/~gbrown/publications/pocockphdthesis.pdf
  −
 
  −
}}</ref>
  −
 
  −
}}</ref>
  −
 
  −
{} / ref
      +
将权重 math w (x,y) / math 放在每个变量值共现的概率上,math p (x,y) / math。这使得某些概率可能比其他概率具有更多或更少的重要性,从而允许量化相关的整体因子或 pr gnaanz 因子。在上面的例子中,对数学 w (1,1) / math,math w (2,2) / math,和 math w (3,3) / math 使用较大的相对权重,对关系数学 w (1,1) ,(2,2) ,(3,3) / math 比对关系数学 w (1,3) ,(2,1) ,(3,2) / math 有更大的信息量,这在某些模式识别的情况下是可取的,等等。这种加权互信息是加权的 kl 散度的一种形式,已知它对某些输入取负值,有些例子中加权互信息也取负值。
      第948行: 第897行:     
:<math>
 
:<math>
  −
<math>
  −
  −
数学
      
\operatorname{I}_K(X;Y) = K(X) - K(X|Y).
 
\operatorname{I}_K(X;Y) = K(X) - K(X|Y).
  −
\operatorname{I}_K(X;Y) = K(X) - K(X|Y).
  −
  −
运营商名称{ i } k (x; y) k (x)-k (x | y)。
      
</math>
 
</math>
  −
</math>
  −
  −
数学
  −
  −
        第991行: 第926行:     
:<math>\operatorname{I} = -\frac{1}{2} \log\left(1 - \rho^2\right)</math>
 
:<math>\operatorname{I} = -\frac{1}{2} \log\left(1 - \rho^2\right)</math>
  −
<math>\operatorname{I} = -\frac{1}{2} \log\left(1 - \rho^2\right)</math>
  −
  −
数学运算器名称{ i }- frac {1} log  left (1- rho ^ 2 right) / math
  −
  −
  −
        第1,007行: 第935行:     
:<math>\begin{align}
 
:<math>\begin{align}
  −
<math>\begin{align}
  −
  −
数学 begin { align }
  −
  −
  \begin{pmatrix}
      
   \begin{pmatrix}
 
   \begin{pmatrix}
  −
开始{ pmatrix }
      
     X_1 \\
 
     X_1 \\
  −
    X_1 \\
  −
  −
X 1
  −
  −
    X_2
      
     X_2
 
     X_2
  −
2
  −
  −
  \end{pmatrix}  &\sim \mathcal{N} \left( \begin{pmatrix}
      
   \end{pmatrix}  &\sim \mathcal{N} \left( \begin{pmatrix}
 
   \end{pmatrix}  &\sim \mathcal{N} \left( \begin{pmatrix}
  −
文中给出了一种新的数学模型—— -- 左(开始{ pmatrix } ,其中左(开始{ pmatrix }
  −
  −
    \mu_1 \\
      
     \mu_1 \\
 
     \mu_1 \\
  −
什么
  −
  −
    \mu_2
      
     \mu_2
 
     \mu_2
  −
2
  −
  −
  \end{pmatrix}, \Sigma \right),\qquad
      
   \end{pmatrix}, \Sigma \right),\qquad
 
   \end{pmatrix}, \Sigma \right),\qquad
  −
右边,右边,右边
      
   \Sigma = \begin{pmatrix}
 
   \Sigma = \begin{pmatrix}
  −
  \Sigma = \begin{pmatrix}
  −
  −
Sigma 开始{ pmatrix }
      
     \sigma^2_1          & \rho\sigma_1\sigma_2 \\
 
     \sigma^2_1          & \rho\sigma_1\sigma_2 \\
  −
    \sigma^2_1          & \rho\sigma_1\sigma_2 \\
  −
  −
Sigma ^ 21 & rho  sigma 1 sigma 2
      
     \rho\sigma_1\sigma_2 & \sigma^2_2
 
     \rho\sigma_1\sigma_2 & \sigma^2_2
  −
    \rho\sigma_1\sigma_2 & \sigma^2_2
  −
  −
西格玛12和西格玛22
  −
  −
  \end{pmatrix} \\
      
   \end{pmatrix} \\
 
   \end{pmatrix} \\
   −
结束{ pmatrix }
+
       H(X_i) &= \frac{1}{2}\log\left(2\pi e \sigma_i^2\right) = \frac{1}{2} + \frac{1}{2}\log(2\pi) + \log\left(\sigma_i\right), \quad i\in\{1, 2\} \\
 
  −
       \Eta(X_i) &= \frac{1}{2}\log\left(2\pi e \sigma_i^2\right) = \frac{1}{2} + \frac{1}{2}\log(2\pi) + \log\left(\sigma_i\right), \quad i\in\{1, 2\} \\
  −
 
  −
      \Eta(X_i) &= \frac{1}{2}\log\left(2\pi e \sigma_i^2\right) = \frac{1}{2} + \frac{1}{2}\log(2\pi) + \log\left(\sigma_i\right), \quad i\in\{1, 2\} \\
  −
 
  −
Eta (xi) & frac {1}对数左(2 pi e sigma i ^ 2右) frac {1}对数(2 pi) + 对数左(sigma i 右) ,四对数(1,2}
  −
 
  −
  \Eta(X_1, X_2) &= \frac{1}{2}\log\left[(2\pi e)^2|\Sigma|\right] = 1 + \log(2\pi) + \log\left(\sigma_1 \sigma_2\right) + \frac{1}{2}\log\left(1 - \rho^2\right) \\
  −
 
  −
  \Eta(X_1, X_2) &= \frac{1}{2}\log\left[(2\pi e)^2|\Sigma|\right] = 1 + \log(2\pi) + \log\left(\sigma_1 \sigma_2\right) + \frac{1}{2}\log\left(1 - \rho^2\right) \\
     −
Eta (x1,x2) & frac {1}2} log [(2 pi e) ^ 2 | Sigma | right ]1 + log (2 pi) + log (σ1 σ2右) + frac {1} log (1 rho ^ 2右)
+
  H(X_1, X_2) &= \frac{1}{2}\log\left[(2\pi e)^2|\Sigma|\right] = 1 + \log(2\pi) + \log\left(\sigma_1 \sigma_2\right) + \frac{1}{2}\log\left(1 - \rho^2\right) \\
    
\end{align}</math>
 
\end{align}</math>
  −
\end{align}</math>
  −
  −
End { align } / math
  −
  −
  −
        第1,107行: 第972行:     
:<math>  
 
:<math>  
  −
<math>
  −
  −
数学
      
   \operatorname{I}\left(X_1; X_2\right)  
 
   \operatorname{I}\left(X_1; X_2\right)  
   −
  \operatorname{I}\left(X_1; X_2\right)
+
= H\left(X_1\right) + H\left(X_2\right) - H\left(X_1, X_2\right)   
 
  −
操作者名{ i }左(x1; x2右)
  −
 
  −
= \Eta\left(X_1\right) + \Eta\left(X_2\right) - \Eta\left(X_1, X_2\right)  
  −
 
  −
= \Eta\left(X_1\right) + \Eta\left(X_2\right) - \Eta\left(X_1, X_2\right)
  −
 
  −
Eta 左(x1右) + Eta 左(x2右)- Eta 左(x1,x2右)
     −
= -\frac{1}{2}\log\left(1 - \rho^2\right)
      
= -\frac{1}{2}\log\left(1 - \rho^2\right)
 
= -\frac{1}{2}\log\left(1 - \rho^2\right)
  −
-  frac {1} log  left (1- rho ^ 2 right)
      
</math>
 
</math>
  −
</math>
  −
  −
数学
  −
  −
  −
        第1,165行: 第1,008行:     
{{Equation box 1
 
{{Equation box 1
  −
{{Equation box 1
  −
  −
{方程式方框1
      
|indent=::
 
|indent=::
  −
|indent=::
  −
  −
我不知道你在说什么:
  −
  −
|equation=
      
|equation=
 
|equation=
  −
方程式
  −
  −
<math>
      
<math>
 
<math>
  −
数学
      
MI(x,y) = \log \frac{P_{X,Y}(x,y)}{P_X(x) P_Y(y)} \approx log \frac{\frac{f_{XY}}{B}}{\frac{f_X}{U} \frac{f_Y}{U}}  
 
MI(x,y) = \log \frac{P_{X,Y}(x,y)}{P_X(x) P_Y(y)} \approx log \frac{\frac{f_{XY}}{B}}{\frac{f_X}{U} \frac{f_Y}{U}}  
  −
MI(x,y) = \log \frac{P_{X,Y}(x,y)}{P_X(x) P_Y(y)} \approx log \frac{\frac{f_{XY}}{B}}{\frac{f_X}{U} \frac{f_Y}{U}}
  −
  −
Mi (x,y) log  frac { p { x,y }(x,y)}{ p x (x) p y (y)} approx log  frac { f { f XY }{ b }{ f f { f }{ y }
  −
  −
</math>
      
</math>
 
</math>
  −
数学
  −
  −
|cellpadding= 6
      
|cellpadding= 6
 
|cellpadding= 6
  −
6号手术室
  −
  −
|border
      
|border
 
|border
  −
边界
  −
  −
|border colour = #0073CF
      
|border colour = #0073CF
 
|border colour = #0073CF
  −
0073CF
      
|background colour=#F5FFFA}}
 
|background colour=#F5FFFA}}
   −
|background colour=#F5FFFA}}
     −
5 / fffa }
      
: where <math>f_{XY}</math> is the number of times the bigram xy appears in the corpus, <math>f_{X}</math> is the number of times the unigram x appears in the corpus, B is the total number of bigrams, and U is the total number of unigrams.<ref name=magerman/>
 
: where <math>f_{XY}</math> is the number of times the bigram xy appears in the corpus, <math>f_{X}</math> is the number of times the unigram x appears in the corpus, B is the total number of bigrams, and U is the total number of unigrams.<ref name=magerman/>
463

个编辑

导航菜单