# 条件随机场

## 马尔可夫随机场(MRF)

 $\displaystyle{ \boldsymbol{P(Y)=\frac{1}{Z} \prod_C \boldsymbol{\Psi}_C(Y_C)} }$



 $\displaystyle{ \boldsymbol{\Psi_C(Y_C)=exp\{-E(Y_C)\}} }$


## 条件随机场(CRF)

 $\displaystyle{ \boldsymbol{P(y_i|x,y_1,...,y_{i-1},y_{i+1},...,y_i)=P(y_i|x,y_{i-1},y_{i+1})} }$


## 形式化模型

$\displaystyle{ \boldsymbol{t_k(y_{i-1},y_i,x,i) \in \{0,1\}} }$

$\displaystyle{ \boldsymbol{s_l(y_i,x,i) \in \{0,1\}} }$

$\displaystyle{ \boldsymbol{f_k(y_{i-1},y_i,x,i)=\begin{cases} t_k(y_{i-1},y_i,x,i) & k=1,2,...,K_1 \\ s_l(y_i,x,i) & k=K_1+l;l=1,2,...,K_2 \end{cases}} }$

$\displaystyle{ \boldsymbol{\Psi_i (Y_i)=exp\left(\sum_{i=1}^n \sum_j \lambda_j f_j(y_{i-1},y_i,x,i)\right)} }$

$\displaystyle{ \boldsymbol{p_{\lambda}^*(y|x,\lambda)=\frac{1}{Z_{\lambda}(x)}exp\left(\sum_{i=1}^n \sum_j \lambda_j f_j(y_{i-1},y_i,x,i)\right)} }$

$\displaystyle{ \boldsymbol{Z_{\lambda}(x)=\sum_j exp\left(\sum_{i=1}^n \sum_j \lambda_j f_j(y_{i-1},y_i,x,i)\right)} }$

## 限域拟牛顿法(L-BFGS)

### 牛顿法(Newton Method)

Newton法从第k次近似值x_k求得第k+1次近似值x_k+1即为求x_k点处的切线和x轴的交点，切线方程：

$\displaystyle{ y-f(x_k) = \nabla f(x_k)(x-x_k) }$
$\displaystyle{ \Rightarrow 0-f(x_k) = \nabla f(f_k)(x-x_k) }$
$\displaystyle{ \Rightarrow \nabla f(x_k)*x = \nabla f(x_k)*x_k- f(x_k) }$
$\displaystyle{ \Rightarrow x = x_k - f(x_k)/\nabla f(x_k) }$


$\displaystyle{ f(x) = f(x_k) + f'(x_k)(x-x_k) + \frac{1}{2}f''(x_k)(x-x_k)^2 }$


$\displaystyle{ \varphi'(x) = f'(x_k) + f''(x_k)(x-x_k) = 0 }$


$\displaystyle{ x_{k+1} = x_k-\frac{f'(x_k)}{f''(x_k)} }$


$\displaystyle{ x_{k+1} }$作为$\displaystyle{ f(x) }$极小点的一个进一步的估计值。重复上述过程，可以产生一系列的极小点估值集合$\displaystyle{ \{x_k\} }$。一定条件下，这个极小点序列$\displaystyle{ \{x_k\} }$收敛于的$\displaystyle{ f(x) }$极值点。

$\displaystyle{ f(x) \approx \varphi(x) = f(x_k) + \nabla f(x_k)(x-x_k) + \frac{1}{2}(x-x_k)^T \nabla^2 f(x-x_k) }$


$\displaystyle{ x_{k+1} = x_k -\frac{\nabla f(x_k)}{\nabla^2 f(x)} }$


(2) 计算$\displaystyle{ \nabla f(x_k) }$$\displaystyle{ \nabla^2 f(x) }$.

(3) 若$\displaystyle{ \Vert \nabla f(x_k) \Vert \lt \varepsilon }$ ，则停止迭代，否则确定搜索方向$\displaystyle{ d_k = -\frac{\nabla f(x_k)}{\nabla^2 f(x)} }$.

(4) 从$\displaystyle{ x_k }$出发，沿$\displaystyle{ d_k }$做一维搜索，

$\displaystyle{ \min_{\lambda} f(x_k + \lambda d_k) = f(x_k + \lambda_k d_k) }$,

$\displaystyle{ x_{k+1} = x_k + \lambda_k d_k }$

(5) 设$\displaystyle{ k = k+1 }$，转步骤(2).

### 拟牛顿法 (Quasi-Newton Method)

$\displaystyle{ f(x) \approx f(x_{k+1}) + \nabla f(x_{k+1})(x-x_{k+1}) + \frac{1}{2}(x-x_{k+1})^T \nabla^2 f(x_{k+1})(x-x_{k+1}) }$


$\displaystyle{ \nabla f(x) \approx \nabla f(x_{k+1}) + \nabla^2 f(x_{k+1})(x-x_{k+1}) }$


$\displaystyle{ x = x_k }$，则

$\displaystyle{ \nabla f(x_{k+1}) - \nabla f(x_k) \approx \nabla^2 f(x_{x+1})(x_k - x_{k+1}) (1) }$


$\displaystyle{ s_k = x_{k+1} - x_k, y_k = \nabla f(x_{k+1}) - \nabla f(x_k) }$


$\displaystyle{ s_k \approx [\nabla^2f(x_{k+1})]^{-1}y_k (2) }$


$\displaystyle{ H_{k+1} }$近似牛顿法中的Hesse矩阵$\displaystyle{ \nabla^2 f(x_{k+1}) }$的逆矩 阵，$\displaystyle{ H_{k+1} }$必须满足

$\displaystyle{ s_k \approx H_{k+1}y_k (3) }$


DFP公式

$\displaystyle{ H_{k+1}=H_k + \Delta H_k }$


DFP算法中定义校正矩阵为

$\displaystyle{ \Delta H_k = \frac{s_k s^T_k}{s^T_k y_k}- \frac{H_k y_k y^T_k H_k}{y^T_k H_k y_k} }$


$\displaystyle{ H_{k+1} = H_k + \frac{s_k s^T_k}{s^T_k y_k}- \frac{H_k y_k y^T_k H_k}{y^T_k H_k y_k} (4) }$


BFGS公式

BFGS公式有时也称为DFP公式的对偶公式。这是因为其推导过程与方程(4)完全一样，只需要用矩阵$\displaystyle{ B_{k+1} }$取代$\displaystyle{ H^{-1}_{k+1} }$，同时将$\displaystyle{ S_k }$$\displaystyle{ y_k }$ 互换，最后可以得到

$\displaystyle{ H_{k+1} = H_k + [1+\frac{y^T_k H_k y_k}{s^T_k y_k}] \cdot \frac{s_k s^T_k}{s^T_k y_k} - \frac{s_k y^T_k H_k}{S^T_k y_k} (5) }$


(2) 设$\displaystyle{ H_0 = I }$，计算出目标函数$\displaystyle{ f(x) }$$\displaystyle{ x_k }$处的梯度$\displaystyle{ g_k = \nabla f(x_k) }$.

(3) 确定搜索方向$\displaystyle{ d_k }$$\displaystyle{ d_k = H_k g_k }$.

(4) 从$\displaystyle{ x_k }$出发，沿$\displaystyle{ d_k }$做一维搜索，满足

$\displaystyle{ f(x_k + \lambda_k d_k) = \min_{\lambda \ge 0}f(x_k + \lambda d_k) }$

$\displaystyle{ x_{k+1} = x_k + \lambda_k d_k }$.

(5) 若$\displaystyle{ \Vert f(x_{k+1} \Vert \le \epsilon) }$，则停止迭代，得到最优解$\displaystyle{ x = x_{k+1} }$，否则进行步骤(6).

(6) 若$\displaystyle{ k = N-1 }$，则令x_0 = x_{k+1}，回到步骤(2)，否则进行步骤(7). (7) 令$\displaystyle{ g_{k+1} = f'(x_{k+1}) }$$\displaystyle{ s_k = x_{k+1} - x_k }$，y_k = g_{k+1}-g_k，利用方程(4)或(5)计算$\displaystyle{ H_{k+1} }$，设$\displaystyle{ k = k+1 }$，回到步骤(3)。

### LBFGS Method

L-BFGS的基本思想是存储有限次数（如m次）的更新矩阵$\displaystyle{ \delta{H_k} }$，如果k > m的话就舍弃m次以前的$\displaystyle{ \delta H_{k-m+1} }$，也即L-BFGS的记忆只有M次。如果$\displaystyle{ m = N }$，则L- BFGS等价于标准的BFGS公式。

$\displaystyle{ H_{k+1} = (I-\rho_k s_k y^T_k) H_k (I- \rho_k y_k s^T_k) + \rho_k s_k s^T_k = v^T_k H_k v_k + \rho_k s_k s^T_k }$ (6)


$\displaystyle{ k+1 \le m }$，则

$\displaystyle{ H_{k+1} = v^T_k v^T_{k-1} ... H_0v_0... v_{k-1}v_k + v^T_k ... v^T_1 \rho_0 s_0 s^T_0 v_1 ...v_k }$
$\displaystyle{ + v^T_k v^T_{k-1}...v^T_2 \rho_1 s_1s^T_1 v_2 ... v_k }$
$\displaystyle{ ... }$
$\displaystyle{ + v^T_k v^T_{k-1} \rho_{k-2} s_{k-2} s^T_{k-2} v_{k-1} v_k }$
$\displaystyle{ + v^T_k \rho_{k-1} s_{k-1} s^T_{k-1} v_k }$
$\displaystyle{ \rho_k s_k s^T_k (9) }$


$\displaystyle{ k+1 \gt m }$ ，则

$\displaystyle{ H_{k+1} = v^T_k v^T_{k-1} ... v^T_{k-m+1} H_0 v_{k-m+1} ... v_{k-1} v_k }$
$\displaystyle{ +v^T_k ... v^T_{k-m+2} \rho_{k-m+1} s^T_{k-m_1} v_{k-m_2} ... v_k }$
$\displaystyle{ ... }$
$\displaystyle{ +v^T_k \rho_{k-1} s_{k-1} s^T_{k-1} v_k }$
$\displaystyle{ +\rho_k s_k s^T_k (10) }$


L-BFGS算法中确定搜索方向$\displaystyle{ d_k }$需要计算$\displaystyle{ H \cdot g }$，下列算法可以高效地完成计算任务：

  $\displaystyle{ incr = 0; BOUND = k }$


ELSE

   $\displaystyle{ incr = k-m; BOUND = m }$


ENDIF

$\displaystyle{ q_{BOUND} = g_k }$;


DO $\displaystyle{ i = (BOUND-1), 0, -1 }$

  $\displaystyle{ j = i + incr }$;
$\displaystyle{ \alpha_i = \rho_j s^T_j q_{i + 1} }$;
储存$\displaystyle{ \alpha_i }$;
$\displaystyle{ q_i = q_{i+1} - \alpha_i y_j }$;


ENDDO

$\displaystyle{ r_0 = H_0 \cdot g_o }$;

DO $\displaystyle{ i = 0, (BOUND-1) }$

  $\displaystyle{ j = i + incr }$;
$\displaystyle{ \beta_j = \rho_j y^T_j r_i }$;
$\displaystyle{ r_{i+1} = r_i + s_j(\alpha_i - \beta_i) }$;


ENDDO

## 参考文献

1. 李航， 统计学习方法
2. Roman Klinger, Katrin Tomanek, Classical Probabilistic Models and Conditional Random Fields