# Ising模型与最大熵分布

## Definitions

n spins $\displaystyle{ \textstyle\underline \sigma\in \{+1,-1\}^n }$ are connected by couplings $\displaystyle{ \textstyle J\in\mathcal{R}^{n\times n} }$.
Different realizations of $\displaystyle{ \textstyle J_{ij} }$ give different systems, for example

• $\displaystyle{ J_{ij}=\textrm{Constant} }$: Ferromagnets model or anti-ferromagnets.
• $\displaystyle{ J_{ij}=\mathcal{N}(0,\Delta): }$Sherrington-Kirkpatrick model, spin glasses.
• $\displaystyle{ J_{ij} \leftarrow }$ Hebb's rule: Hopfield model, associative memories.
• $\displaystyle{ J_{ij} }$ are learned from data: neural networks.

The energy of a configuration $\displaystyle{ \underline\sigma }$ is

$\displaystyle{ E(\underline\sigma)=-\sum_{ij}J_{ij}\sigma_i\sigma_j-\sum_i\sigma_i\theta_i, }$

where $\displaystyle{ \theta_i }$ is the external field added on spin $\displaystyle{ \textstyle i }$.
Note that In the whole discussions I would set the external field to zero, because this does not change quantitatively the results we are going to show, but significantly reduces the length of formulas :)

In the canonical ensemble, the probability of finding a configuration in the equilibrium at inverse temperature $\displaystyle{ \beta }$ follows the Boltzmann distribution:

$\displaystyle{ P(\sigma)=\frac{1}{Z}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j}, }$

where

$\displaystyle{ Z=\sum_{\underline\sigma}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j} }$

is the partition function.
Notice that

• There are totally $\displaystyle{ 2^n }$ configurations in the summation.
• when $\displaystyle{ \beta =0 }$, every configuration has the identical Boltzmann weights, which is $\displaystyle{ \textstyle 2^{-n} }$.
• when $\displaystyle{ \beta\to\infty }$, only configurations having the lowest energy has finite probability measure.

## Why Ising model?

In addition to physical motivations (phase transitions, criticality, ...), another reason that the Ising model is useful in model science and technique is that it is the Maximum entropy model given first two moments of observations. That is the distribution that make the least bias or claim to the observed data.

Suppose we have m configurations $\displaystyle{ \textstyle \{\underline\sigma\}\in\{1,-1\}^{m\times n} }$ that are sampled from the Boltzmann distribution of the model, then we can define the following statistics that can be observed from data:

• Magnetization $\displaystyle{ \textstyle m_i= \sum_{t=1}^m\sigma_i^t\langle \sigma_i\rangle\approx }$
• Correlations $\displaystyle{ \textstyle C_{ij}= \sum_{t=1}^m\sigma_i^t\sigma_j^t\approx\langle \sigma_i\sigma_j\rangle }$

Many distributions can be used to generate data with given first and second moments, suppose $\displaystyle{ \textstyle P(\underline\sigma) }$ is such a distribution. Then we can write out the entropy of the distribution as

$\displaystyle{ S_p=-\sum_{\underline\sigma}P(\underline\sigma)\log P(\underline\sigma). }$

Of cause, there are constraints that need to be satisfied:

$\displaystyle{ \sum_{\underline\sigma}P(\underline\sigma)=1 }$
$\displaystyle{ \forall i,\,\, \sum_{\underline\sigma}P(\underline\sigma)\sigma_i=m_i }$
$\displaystyle{ \forall (i,j),\,\, \sum_{\underline\sigma}P(\underline\sigma)\sigma_i\sigma_j=C_{ij}. }$

We define a Lagrangian as

$\displaystyle{ \mathcal {L}_P=-\sum_{\underline\sigma}P(\underline\sigma)\log P(\underline\sigma)+\sum_i\lambda_i\left (m_i-\sum_{\underline\sigma}P(\underline\sigma)\sigma_i\right )+\sum_{ij}\lambda_{ij}\left (C_{ij}-\sum_{\underline\sigma}P(\underline\sigma)\sigma_i\sigma_j\right )+\lambda \sum_{\underline\sigma}P(\underline\sigma)-1, }$

where $\displaystyle{ \textstyle \{\lambda_i\}\,\,\{\lambda_{ij}\} }$ are multipliers.

By setting $\displaystyle{ \textstyle \frac{\partial\mathcal {L}_P}{\partial P}=0 }$, we have

$\displaystyle{ -(\log P-1)+\sum_i\lambda_i\sigma_i+\sum_{ij}\lambda_{ij}\sigma_i\sigma_j+\lambda=0, }$

which yields

$\displaystyle{ P(\sigma)=\frac{1}{Z}e^{\sum_{ij}\beta J_{ij}\sigma_i\sigma_j}. }$