Cavity method and belief propagation for the Ising model

来自集智百科
跳到导航 跳到搜索

Variational approach using the Bethe approximation

The Boltzmann distribution is

[math]\displaystyle{ P(\underline\sigma)=\frac{1}{Z}e^{-\beta E(\sigma)} }[/math]
[math]\displaystyle{ Z=\sum_{\underline\sigma}e^{-\beta E(\sigma)} }[/math]

The Bethe approximation asks to use

[math]\displaystyle{ \hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)} }[/math]

Here [math]\displaystyle{ \textstyle\langle ij\rangle }[/math] denotes pair of nodes that is connected in the topology of the Ising model. That is, [math]\displaystyle{ \textstyle J_{ ij}\gt 0, }[/math] [math]\displaystyle{ \textstyle d_i }[/math] is the degree of node i in the topology, and

[math]\displaystyle{ \textstyle P_{ij}(\sigma_i,\sigma_j)=\sum_{\underline\sigma\backslash\sigma_i,\sigma_j}P(\underline\sigma), }[/math],
[math]\displaystyle{ \textstyle P_{i}(\sigma_i) =\sum_{\underline\sigma\backslash\sigma_i}P(\underline\sigma) }[/math] are two-point and one-point marginals respectively.

Then the task is to find parameters of the Bethe approximation, such that

[math]\displaystyle{ \{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}D_{\textrm{KL}}(P,\hat P). }[/math]

And we can write the Kellback-Leibler divergence as

[math]\displaystyle{ \begin{align} D_{\textrm{KL}}(P,\hat P) &= \sum_{\underline\sigma}\hat P(\underline\sigma)\log\frac{\hat P(\underline\sigma)}{ P(\underline\sigma)}\\ &=\sum_{\underline\sigma}\hat P(\underline\sigma)\log \hat P(\underline\sigma)-\sum_{\underline\sigma}\hat P(\underline\sigma)\log P(\underline\sigma)\\ &=-S_{\hat P}+\sum_{\underline\sigma} \hat{P}(\underline\sigma)\beta E(\underline\sigma)+\sum_{\underline\sigma}\hat P(\underline\sigma)\log Z\\ &=-S_{\hat P}+\beta\langle E\rangle_{\hat P} +\log Z\\ &=\beta F_{\hat P}-\beta F_{P}\\ &\geq 0. \end{align} }[/math]

It means that

  • A good estimate of parameters in the approximation can be obtained by minimizing the variational free energy [math]\displaystyle{ \textstyle F_{\hat P} }[/math].
  • The variational free energy gives an upper-bound to the true free energy.
[math]\displaystyle{ \{\hat P_{ij},\hat P_i \}=\arg\min_{P_{ij},P_i}F_{\hat P}. }[/math]

Since

[math]\displaystyle{ \hat P(\underline \sigma)=\frac{\prod_{\langle ij\rangle}P_{ij}(\sigma_i,\sigma_j)}{\prod_iP_i^{d_i-1}(\sigma_i)}, }[/math]

we have

[math]\displaystyle{ \log \hat{P}(\underline\sigma)=\sum_{\langle ij\rangle}\log P_{ij}-\sum_i(d_i-1)\log P_i. }[/math]

Thus

[math]\displaystyle{ \begin{align} F_{\hat P}&=\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}J_{ij}\sigma_i\sigma_j+\sum_{\underline\sigma}\hat P(\underline\sigma)\sum_{\langle ij\rangle}\log P_{ij}-\sum_{\underline\sigma}\hat P(\underline\sigma)(d_i-1)\log P_i\\ &=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i) \end{align} }[/math]

Now let us introduce several Lagragnian with several multipliers

[math]\displaystyle{ \begin{align} \mathcal L&=\sum_{\langle ij\rangle}\sum_{\sigma_i\sigma_j}P_{ij}(\sigma_i,\sigma_j)(E_{ij}+\log P_{ij})-\sum_i\sum_{\sigma_i}P_i(\sigma_i)(d_i-1)\log P_i(\sigma_i)\\ &-\sum_i\lambda_i\left (\sum_{\sigma_i}P_i(\sigma_i)-1\right )-\sum_{\langle ij\rangle}\lambda_{j\to i}\left ( \sum_{\sigma_j} P_{ij}(\sigma_i,\sigma_j)-P_i(\sigma_i)\right)-\sum_{\langle ij\rangle}\lambda_{i\to j}\left ( \sum_{\sigma_i} P_{ij}(\sigma_i,\sigma_j)-P_j(\sigma_j)\right) \end{align} }[/math]

By setting derivatives to zero we have

[math]\displaystyle{ \begin{align} \frac{\partial \mathcal L}{\partial P_{i}(\sigma_i)}&=(1-d_i)(1+\log P_i(\sigma_i))-\lambda_i-\sum_{\langle ij\rangle}\lambda_{j\to i}=0,\\ \frac{\partial \mathcal L}{\partial P_{ij}(\sigma_i,\sigma_j)}&=E_{ij}(1+\log P_{ij})-\lambda_{j\to i}-\lambda_{i\to j}=0. \end{align} }[/math]

Last equations yield that

[math]\displaystyle{ \begin{align} P_i(\sigma_i)&\propto e^{\sum_{\langle ij\rangle}\lambda_{j\to i}},\\ P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}-\lambda_{j\to i}-\lambda_{i\to j}}. \end{align} }[/math]

After introducing new variables, we have

[math]\displaystyle{ \begin{align} P_i(\sigma_i)&\propto \prod_{j\in\partial i}\psi^{j\to i},\\ P_{ij}(\sigma_i,\sigma_j)&\propto e^{-E_{ij}}\psi^{j\to i}\psi^{i\to j}. \end{align} }[/math]

where [math]\displaystyle{ \textstyle \{\psi_{i\to j}\} }[/math] are cavity messages along the directed edges of the graph, and they can be determined using the Belief Propagation equation (also called Sum-Product equation):

[math]\displaystyle{ \begin{align} \psi_{i\to j}(\sigma_j)&=\frac{1}{Z_{i\to j}}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i),\\ Z_{i\to j}&=\sum_{\sigma_j}\sum_{\sigma_i}e^{\beta E_{ij}(\sigma_i,\sigma_j)}\prod_{k\in\partial i\backslash j}\psi_{k\to i}(\sigma_i). \end{align} }[/math]

Computing macro observables

[math]\displaystyle{ \begin{align} F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\ \Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\ \Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}. \end{align} }[/math]

Deriving Belief Propagation using directly the conditional independence

We can actually try to solve the marginals directly by adding an additional spin i into a system with n spins. [math]\displaystyle{ \begin{align} P_i(\sigma_i)&=\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k}\}}P_{N+1}(\sigma_i;\underline{\sigma_j};\underline{\sigma_k})\\ &=\frac{\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \},\{\underline{\sigma_k} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\frac{\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})}}{\sum_{\{\underline{\sigma_j} \}}\sum_{\{\underline{\sigma_k} \}}e^{ -\beta \sum_{x,y:x,y\neq i} E_{x,y}(\underline{\sigma_j},\underline{\sigma_k})} } }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N(\underline{\sigma_j},\underline{\sigma_k}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\sum_{\{\underline{\sigma_k} \}}P_N{ (\underline{\sigma_j},\underline{\sigma_k})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N(\underline{\sigma_j}) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}P_N{ (\underline{\sigma_j})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}e^{ -\beta \sum_j E_{i,j}(\sigma_i,\underline{\sigma_j})}\prod_{j\in\partial i}P_N{( {\sigma_j})} }\\ &=\frac{\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\sum_{\{\underline{\sigma_j} \}}\prod_{j\in\partial i}e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} }\\ &=\frac{\prod_{j\in\partial i}\sum_{ {\sigma_j} }e^{ -\beta E_{i,j}(\sigma_i,\sigma_j)}\psi_{j\to i}(\sigma_j) }{\sum_{\sigma_i}\prod_{j\in\partial i}\sum_{{\sigma_j} }e^{ -\beta E_{i,j}(\sigma_i,{\sigma_j})}\psi_{j\to i}{( {\sigma_j})} } \end{align} }[/math]


Computing macro observables

The Bethe free energy can be computed as

[math]\displaystyle{ \begin{align} F&=\sum_{i}\Delta F_i-\sum_{\langle ij\rangle}\Delta F_{ij},\\ \Delta F_i&=-\frac{1}{\beta}\log Z_i=-\frac{1}{\beta}\sum_{\sigma_i}\prod_{k\in\partial i\backslash j}\sum_{\sigma_k}e^{-\beta E_{ik}(\sigma_i,\sigma_k)}\psi_{\sigma_k}^{k\to i},\\ \Delta F_{ij}&=-\frac{1}{\beta}\log Z_{ij}=-\frac{1}{\beta}\sum_{\sigma_i,\sigma_j}e^{-\beta E_{ij}(\sigma_i,\sigma_j)}\psi_{\sigma_i}^{i\to j}\psi_{\sigma_j}^{j\to i}. \end{align} }[/math]

Then average internal energy and entropy can be computed as

[math]\displaystyle{ \begin{align} E&=\sum_{i}\Delta E_i-\sum_{\langle ij\rangle}\Delta E_{ij},\\ &=\sum_{i}\frac{\partial(\beta F_i)}{\delta \beta}-\sum_{\langle ij\rangle}\frac{\partial (\beta\Delta F_{ij})}{\partial \beta}\\ S&=F-\beta E \end{align} }[/math]


Taking ensemble average using the Population Dynamics

If we use [math]\displaystyle{ \textstyle \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \} }[/math] to denote the Replica Symmetric equations written in the previous sections, then distribution of messages in the directed edges sampled from ensemble of graphs can be written as

[math]\displaystyle{ \mathcal P(P^{i\to j}_{\sigma_i})=\int \prod_{k\in\partial i\backslash j}dP_{\sigma_k}^{k\to i}\mathcal P(P_{\sigma_k}^{k\to i})\cdot\delta\left [ P^{i\to j}_{\sigma_i} - \mathcal F_{\textrm{RS}}(\{ P_{\sigma_k}^{k\to i} \}\right ] ). }[/math]

Then the macro observables like energy and entropy can be computed using the distribution of messages.