更改

奇异值分解（SVD） (查看源代码)

2024年11月5日 (二) 11:05的版本

添加508字节、 2024年11月5日 (星期二)

无编辑摘要

第42行：第42行：

===<math>\mathbf{U}</math>和<math>\mathbf{V}</math>的列构成正交标准基===

−

由于<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>~~都是酉矩阵，它们的列分别形成一组正交标准向量（orthonormal vectors），我们可以将其视为基向量（basis~~ vectors）。矩阵<math>\mathbf{M}</math>把基向量<math>\mathbf{V}_i</math>映射到拉伸后的单位向量<math>\sigma_i\mathbf{U}_i</math>上。根据酉矩阵的定义，它们的共轭转置<math>\mathbf{U}^*</math>和<math>\mathbf{V}</math>也具有相同性质，只是失去了奇异值作为拉伸的几何解释。简言之，<math>\mathbf{U}</math>、<math>\mathbf{U}^*</math>、<math>\mathbf{V}</math>和<math>\mathbf{V}^*</math>~~的列都构成标准正交基（orthonormal~~ bases）。

+

由于<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>都是酉矩阵，它们的列分别形成一组[[正交标准向量]]（orthonormal vectors），我们可以将其视为[[基向量]]（basis vectors）。矩阵<math>\mathbf{M}</math>把基向量<math>\mathbf{V}_i</math>映射到拉伸后的单位向量<math>\sigma_i\mathbf{U}_i</math>上。根据酉矩阵的定义，它们的共轭转置<math>\mathbf{U}^*</math>和<math>\mathbf{V}</math>也具有相同性质，只是失去了奇异值作为拉伸的几何解释。简言之，<math>\mathbf{U}</math>、<math>\mathbf{U}^*</math>、<math>\mathbf{V}</math>和<math>\mathbf{V}^*</math>的列都构成[[标准正交基]]（orthonormal bases）。

−

当<math>\mathbf{M}</math>~~是正半定厄米特矩阵（positive~~-semidefinite Hermitian matrix）时，<math>\mathbf{U}</math>和<math>\mathbf{V}</math>都等同于用于对角化<math>\mathbf{M}</math>的酉矩阵。然而，如果<math>\mathbf{M}</math>~~不是正半定厄米特矩阵但仍可对角化，那么其特征分解和奇异值分解就会有所不同。~~

+

当<math>\mathbf{M}</math>是[[正半定厄米特矩阵]]（positive-semidefinite Hermitian matrix）时，<math>\mathbf{U}</math>和<math>\mathbf{V}</math>都等同于用于对角化<math>\mathbf{M}</math>的酉矩阵。然而，如果<math>\mathbf{M}</math>不是正半定厄米特矩阵但仍可对角化，那么其[[特征分解]]和奇异值分解就会有所不同。

===与四个基本子空间的关系：===

−

* <math>\mathbf{U}</math>的前<math>r</math>列构成了<math>\mathbf{M}</math>~~列空间的基。~~

+

* <math>\mathbf{U}</math>的前<math>r</math>列构成了<math>\mathbf{M}</math>[[列空间]]的基。

−

* <math>\mathbf{U}</math>的后<math>m-r</math>列则构成<math>\mathbf{M}^*</math>~~零空间的基。~~

+

* <math>\mathbf{U}</math>的后<math>m-r</math>列则构成<math>\mathbf{M}^*</math>[[零空间]]的基。

−

* <math>\mathbf{V}</math>的前<math>r</math>列构成<math>\mathbf{M}^*</math>列空间的基（在实数情况下即为<math>\mathbf{M}</math>~~行空间的基）。~~

+

* <math>\mathbf{V}</math>的前<math>r</math>列构成<math>\mathbf{M}^*</math>列空间的基（在实数情况下即为<math>\mathbf{M}</math>[[行空间]]的基）。

* <math>\mathbf{V}</math>的后<math>n-r</math>列构成<math>\mathbf{M}</math>零空间的基。

===几何意义：===

−

由于<math>\mathbf{U}</math>和<math>\mathbf{V}</math>是酉矩阵，<math>\mathbf{U}</math>的列<math>\mathbf{U}_1,\ldots,\mathbf{U}_m</math>构成<math>K^m</math>~~的正交标准基，~~<math>\mathbf{V}</math>的列<math>\mathbf{V}_1,\ldots,\mathbf{V}_n</math>构成<math>K^n</math>~~的正交标准基（基于这些空间的标量积（scalar~~ products））。

+

由于<math>\mathbf{U}</math>和<math>\mathbf{V}</math>是酉矩阵，<math>\mathbf{U}</math>的列<math>\mathbf{U}_1,\ldots,\mathbf{U}_m</math>构成<math>K^m</math>的[[正交标准基]]，<math>\mathbf{V}</math>的列<math>\mathbf{V}_1,\ldots,\mathbf{V}_n</math>构成<math>K^n</math>的正交标准基（基于这些空间的[[标量积]]（scalar products））。

−

~~线性变换定义为：~~

+

[[线性变换]]定义为：

<math>

第71行：第71行：

SVD定理的几何含义可以概括为：对于每个线性映射<math>T:K^n\to K^m</math>，我们能找到<math>K^n</math>和<math>K^m</math>的正交标准基，使<math>T</math>将<math>K^n</math>的第<math>i</math>个基向量映射到<math>K^m</math>的第<math>i</math>个基向量的非负倍数，并将剩余基向量映射到零。在这些基下，映射<math>T</math>表现为一个具有非负实数对角元素的对角矩阵。

−

为了更直观地理解奇异值和SVD分解（至少在实向量空间中），我们可以考虑<math>\mathbf{R}^n</math>中半径为1的球面<math>S</math>。线性映射<math>T</math>将这个球面映射到<math>\mathbf{R}^m</math>~~中的一个椭球体。非零奇异值就是这个椭球体半轴（semi~~-axes）的长度。特别地，当<math>n=m</math>且所有奇异值都不同且非零时，线性映射<math>T</math>的SVD可以分解为三个连续步骤：

+

为了更直观地理解奇异值和SVD分解（至少在实向量空间中），我们可以考虑<math>\mathbf{R}^n</math>中半径为1的球面<math>S</math>。线性映射<math>T</math>将这个球面映射到<math>\mathbf{R}^m</math>中的一个椭球体。非零奇异值就是这个椭球体[[半轴]]（semi-axes）的长度。特别地，当<math>n=m</math>且所有奇异值都不同且非零时，线性映射<math>T</math>的SVD可以分解为三个连续步骤：

1. 考虑椭球体<math>T(S)</math>及其轴，然后找出<math>\mathbf{R}^n</math>中被<math>T</math>映射到这些轴上的方向。这些方向恰好相互正交。首先应用等距变换<math>\mathbf{V}^*</math>，将这些方向送到<math>\mathbf{R}^n</math>的坐标轴。

−

2. ~~应用沿坐标轴对角化的自同态（endomorphism）~~<math>\mathbf{D}</math>，在每个方向上进行拉伸或收缩，使用<math>T(S)</math>的半轴长度作为拉伸系数。组合<math>\mathbf{D} \circ \mathbf{V}^*</math>将单位球面送到与<math>T(S)</math>等距的椭球体。

+

2. 应用沿坐标轴对角化的[[自同态]]（endomorphism）<math>\mathbf{D}</math>，在每个方向上进行拉伸或收缩，使用<math>T(S)</math>的半轴长度作为拉伸系数。组合<math>\mathbf{D} \circ \mathbf{V}^*</math>将单位球面送到与<math>T(S)</math>等距的椭球体。

3. 最后，对这个椭球体应用等距变换<math>\mathbf{U}</math>以得到<math>T(S)</math>。

第124行：第124行：

</math>

−

注意，缩放矩阵<math>\mathbf{\Sigma}</math>在对角线以外的元素都为零（用灰色斜体表示），且有一个对角线元素为零（用红色粗体表示）。由于<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>~~都是酉矩阵，将它们分别与其共轭转置相乘会得到单位矩阵（identity~~ matrices）。在这个例子中，<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>~~都是实值矩阵，因此它们都是正交矩阵。我们可以验证：~~

+

注意，缩放矩阵<math>\mathbf{\Sigma}</math>在对角线以外的元素都为零（用灰色斜体表示），且有一个对角线元素为零（用红色粗体表示）。由于<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>都是酉矩阵，将它们分别与其共轭转置相乘会得到[[单位矩阵]]（identity matrices）。在这个例子中，<math>\mathbf{U}</math>和<math>\mathbf{V}^*</math>都是实值矩阵，因此它们都是[[正交矩阵]]。我们可以验证：

<math>

第178行：第178行：

* 一个 <math>m \times n</math> 矩阵 <math>\mathbf{M}</math> 最多有 <math>p</math> 个不同的奇异值。

−

* 我们总能在 ⁠<math>\mathbf{K^m}</math>⁠ ~~空间中找到一组酉基~~ <math>\mathbf{U}⁠</math>⁠，其中部分基向量，张成了矩阵 <math>⁠\mathbf{M}</math>⁠中每个奇异值对应的左奇异向量。

+

* 我们总能在 ⁠<math>\mathbf{K^m}</math>⁠ 空间中找到一组[[酉基]] <math>\mathbf{U}⁠</math>⁠，其中部分基向量，张成了矩阵 <math>⁠\mathbf{M}</math>⁠中每个奇异值对应的左奇异向量。

* 同样地，在 <math>⁠\mathbf{K^n}</math>⁠ 空间中也存在一组酉基 <math>\mathbf{V}⁠</math>⁠，其中部分基向量，张成了矩阵 <math>⁠\mathbf{M}⁠</math>中每个奇异值对应的右奇异向量。

当我们能找到两个线性独立的左（或右）奇异向量时，我们称该奇异值为简并（degenerate）的。如果 <math>\mathbf{u}_1</math> 和 <math>\mathbf{u}_2</math> 是对应奇异值 <math>\sigma</math> 的两个左奇异向量，那么这两个向量的任何归一化线性组合也是对应奇异值 <math>\sigma</math> 的左奇异向量。右奇异向量也有类似性质。独立的左奇异向量和右奇异向量数量相同，它们分别出现在 <math>\mathbf{U}</math> 和 <math>\mathbf{V}</math> 的对应列中，这些列对应着 <math>\mathbf{\Sigma}</math> 中值为 <math>\sigma</math> 的对角元素。

−

特别地，奇异值为0的左奇异向量和右奇异向量分别包括 <math>\mathbf{M}</math> ~~的余核（cokernel）和核（kernel）中的所有单位向量。根据秩~~-~~零化度定理（rank–nullity~~ theorem），如果 <math>m \neq n</math>，它们的维数不可能相同。即使所有奇异值都非零，当 <math>m > n</math> 时，余核是非平凡（nontrivial）的，此时 <math>\mathbf{U}</math> 用余核中的 <math>m-n</math> 个正交向量填充。相反，当 <math>m < n</math> 时，<math>\mathbf{V}</math> 由核中的 <math>n-m</math> 个正交向量填充。然而，如果存在0的奇异值，<math>\mathbf{U}</math> 或 <math>\mathbf{V}</math> 的额外列已经作为左奇异向量或右奇异向量出现。

+

特别地，奇异值为0的左奇异向量和右奇异向量分别包括 <math>\mathbf{M}</math> 的[[余核]]（cokernel）和[[核]]（kernel）中的所有单位向量。根据[[秩-零化度定理]]（rank–nullity theorem），如果 <math>m \neq n</math>，它们的维数不可能相同。即使所有奇异值都非零，当 <math>m > n</math> 时，余核是非平凡（nontrivial）的，此时 <math>\mathbf{U}</math> 用余核中的 <math>m-n</math> 个正交向量填充。相反，当 <math>m < n</math> 时，<math>\mathbf{V}</math> 由核中的 <math>n-m</math> 个正交向量填充。然而，如果存在0的奇异值，<math>\mathbf{U}</math> 或 <math>\mathbf{V}</math> 的额外列已经作为左奇异向量或右奇异向量出现。

非简并奇异值总有唯一的左奇异向量和右奇异向量，只能乘以单位相位因子（unit-phase factor） <math>e^{i\varphi}</math>（实数情况下，只能乘以正负号）。因此，如果方阵 <math>\mathbf{M}</math> 的所有奇异值都是非简并且非零的，那么它的奇异值分解是唯一的，只能对 <math>\mathbf{U}</math> 的一列乘以单位相位因子，同时对 <math>\mathbf{V}</math> 的相应列乘以相同的单位相位因子。总的来说，SVD 在对 <math>\mathbf{U}</math> 和 <math>\mathbf{V}</math> 的列向量（这些列向量张成每个奇异值的子空间）统一应用任意酉变换，以及对 <math>\mathbf{U}</math> 和 <math>\mathbf{V}</math> 的向量（这些向量分别张成 <math>\mathbf{M}</math> 的核和余核）应用任意酉变换的情况下是唯一的。

第207行：第207行：

等式右侧描述了左侧的特征值分解。由此可得：

−

* <math>\mathbf{V}</math> 的列（即右奇异向量）是 <math>\mathbf{M}^*\mathbf{M}</math> ~~的特征向量（eigenvectors）。~~

+

* <math>\mathbf{V}</math> 的列（即右奇异向量）是 <math>\mathbf{M}^*\mathbf{M}</math> 的[[特征向量]]（eigenvectors）。

* <math>\mathbf{U}</math> 的列（即左奇异向量）是 <math>\mathbf{M}\mathbf{M}^*</math> 的特征向量。

−

* <math>\mathbf{\Sigma}</math> 的非零元素（即非零奇异值）是 <math>\mathbf{M}^*\mathbf{M}</math> 或 <math>\mathbf{M}\mathbf{M}^*</math> ~~非零特征值的平方根。~~

+

* <math>\mathbf{\Sigma}</math> 的非零元素（即非零奇异值）是 <math>\mathbf{M}^*\mathbf{M}</math> 或 <math>\mathbf{M}\mathbf{M}^*</math> 非零[[特征值]]的平方根。

−

当 <math>\mathbf{M}</math> ~~为正规矩阵（normal matrix）时，根据谱定理（spectral theorem），我们可以用特征向量的基对其进行酉对角化，得到分解~~ <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^*</math>。其中 <math>\mathbf{U}</math> 是酉矩阵，<math>\mathbf{D}</math> 是对角线上有复数元素 <math>\sigma_i</math> 的对角矩阵。若 <math>\mathbf{M}</math> ~~为半正定（positive~~ semi-definite）矩阵，则 <math>\sigma_i</math> 为非负实数，此时分解 <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^*</math> 也是一个奇异值分解。否则，我们可以将每个 <math>\sigma_i</math> 的相位 <math>e^{i\varphi}</math> 移到相应的 <math>\mathbf{V}_i</math> 或 <math>\mathbf{U}_i</math> ~~中，从而重新表示为SVD形式。SVD与非正规矩阵的联系主要体现在极分解定理：~~<math>\mathbf{M} = \mathbf{S}\mathbf{R}</math>，其中 <math>\mathbf{S} = \mathbf{U}\mathbf{\Sigma}\mathbf{U}^*</math> 是半正定且正规的，<math>\mathbf{R} = \mathbf{U}\mathbf{V}^*</math> 是酉的。

+

当 <math>\mathbf{M}</math> 为[[正规矩阵]]（normal matrix）时，根据[谱定理]]（spectral theorem），我们可以用[[特征向量]]的基对其进行[[酉对角化]]，得到分解 <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^*</math>。其中 <math>\mathbf{U}</math> 是酉矩阵，<math>\mathbf{D}</math> 是对角线上有复数元素 <math>\sigma_i</math> 的对角矩阵。若 <math>\mathbf{M}</math> 为[[半正定]]（positive semi-definite）矩阵，则 <math>\sigma_i</math> 为非负实数，此时分解 <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^*</math> 也是一个奇异值分解。否则，我们可以将每个 <math>\sigma_i</math> 的相位 <math>e^{i\varphi}</math> 移到相应的 <math>\mathbf{V}_i</math> 或 <math>\mathbf{U}_i</math> 中，从而重新表示为SVD形式。SVD与非正规矩阵的联系主要体现在[[极分解]]定理：<math>\mathbf{M} = \mathbf{S}\mathbf{R}</math>，其中 <math>\mathbf{S} = \mathbf{U}\mathbf{\Sigma}\mathbf{U}^*</math> 是半正定且正规的，<math>\mathbf{R} = \mathbf{U}\mathbf{V}^*</math> 是酉的。

−

因此，除半正定矩阵外，<math>\mathbf{M}</math> 的特征值分解和SVD虽有关联，但并不相同：特征值分解形如 <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^{-1}</math>，其中 <math>\mathbf{U}</math> 不一定是酉的，<math>\mathbf{D}</math> 不一定是半正定的；而SVD形如 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，其中 <math>\mathbf{\Sigma}</math> 是对角的且半正定的，<math>\mathbf{U}</math> 和 <math>\mathbf{V}</math> 是酉矩阵，它们除了通过矩阵 <math>\mathbf{M}</math> ~~外不一定有关联。值得注意的是，只有非亏损（non~~-defective）方阵才有特征值分解，而任何 <math>m \times n</math> 矩阵都存在SVD。

+

因此，除半正定矩阵外，<math>\mathbf{M}</math> 的特征值分解和SVD虽有关联，但并不相同：特征值分解形如 <math>\mathbf{M} = \mathbf{U}\mathbf{D}\mathbf{U}^{-1}</math>，其中 <math>\mathbf{U}</math> 不一定是酉的，<math>\mathbf{D}</math> 不一定是半正定的；而SVD形如 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，其中 <math>\mathbf{\Sigma}</math> 是对角的且半正定的，<math>\mathbf{U}</math> 和 <math>\mathbf{V}</math> 是酉矩阵，它们除了通过矩阵 <math>\mathbf{M}</math> 外不一定有关联。值得注意的是，只有[[非亏损]]（non-defective）方阵才有特征值分解，而任何 <math>m \times n</math> 矩阵都存在SVD。

==奇异值分解的应用==

第219行：第219行：

===伪逆===

−

~~我们可以利用奇异值分解计算矩阵的伪逆。设矩阵~~ <math>\mathbf{M}</math> 的奇异值分解为 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，则其伪逆为：

+

我们可以利用奇异值分解计算矩阵的[[伪逆]]。设矩阵 <math>\mathbf{M}</math> 的奇异值分解为 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，则其伪逆为：

<math>\mathbf{M}^+ = \mathbf{V}\mathbf{\Sigma}^+\mathbf{U}^*</math>

−

这里，<math>\mathbf{\Sigma}^+</math> 是 <math>\mathbf{\Sigma}</math> ~~的伪逆，我们通过将每个非零对角元素取倒数并转置来得到它。伪逆在解决线性最小二乘问题（linear~~ least squares problems）时常常派上用场。

+

这里，<math>\mathbf{\Sigma}^+</math> 是 <math>\mathbf{\Sigma}</math> 的伪逆，我们通过将每个非零对角元素取倒数并转置来得到它。伪逆在解决[[线性最小二乘问题]]（linear least squares problems）时常常派上用场。

===求解齐次线性方程===

−

~~我们可以将一组齐次线性方程（homogeneous~~ linear equations）表示为 <math>\mathbf{A}\mathbf{x} = \mathbf{0}</math>，其中 <math>\mathbf{A}</math> 是矩阵，<math>\mathbf{x}</math> 是向量。通常，我们已知 <math>\mathbf{A}</math>，需要找出满足方程的非零 <math>\mathbf{x}</math>。这样的 <math>\mathbf{x}</math> 属于 <math>\mathbf{A}</math> ~~的零空间，也称为~~ <math>\mathbf{A}</math> 的（右）零向量。我们可以将向量 <math>\mathbf{x}</math> 描述为与 <math>\mathbf{A}</math> 的零奇异值对应的右奇异向量。这意味着，如果 <math>\mathbf{A}</math> ~~是方阵且没有零奇异值，则方程没有非零解。若有多个零奇异值，那么对应的右奇异向量的任意线性组合都是有效解。类似地，满足~~ <math>\mathbf{x}^*\mathbf{A} = \mathbf{0}</math> 的非零 <math>\mathbf{x}</math>（其中 <math>\mathbf{x}^*</math> 是 <math>\mathbf{x}</math> 的共轭转置）被称为 <math>\mathbf{A}</math> 的左零向量。

+

我们可以将一组[[齐次线性方程]]（homogeneous linear equations）表示为 <math>\mathbf{A}\mathbf{x} = \mathbf{0}</math>，其中 <math>\mathbf{A}</math> 是矩阵，<math>\mathbf{x}</math> 是向量。通常，我们已知 <math>\mathbf{A}</math>，需要找出满足方程的非零 <math>\mathbf{x}</math>。这样的 <math>\mathbf{x}</math> 属于 <math>\mathbf{A}</math> 的[[零空间]]，也称为 <math>\mathbf{A}</math> 的（右）零向量。我们可以将向量 <math>\mathbf{x}</math> 描述为与 <math>\mathbf{A}</math> 的零奇异值对应的右奇异向量。这意味着，如果 <math>\mathbf{A}</math> 是[[方阵]]且没有零奇异值，则方程没有非零解。若有多个零奇异值，那么对应的右奇异向量的任意线性组合都是有效解。类似地，满足 <math>\mathbf{x}^*\mathbf{A} = \mathbf{0}</math> 的非零 <math>\mathbf{x}</math>（其中 <math>\mathbf{x}^*</math> 是 <math>\mathbf{x}</math> 的共轭转置）被称为 <math>\mathbf{A}</math> 的左零向量。

===总体最小二乘最小化===

−

~~总体最小二乘问题（total~~ least squares）旨在找到一个向量 <math>\mathbf{x}</math>，使得在 <math>\|\mathbf{x}\| = 1</math> 的约束下（用双竖线 "||" 包围向量表示对该向量求范数），<math>\mathbf{A}\mathbf{x}</math> 的2-范数（2-norm）最小。结果表明，解就是 <math>\mathbf{A}</math> 最小奇异值对应的右奇异向量。

+

[[总体最小二乘问题]]（total least squares）旨在找到一个向量 <math>\mathbf{x}</math>，使得在 <math>\|\mathbf{x}\| = 1</math> 的约束下（用双竖线 "||" 包围向量表示对该向量求范数），<math>\mathbf{A}\mathbf{x}</math> 的2-范数（2-norm）最小。结果表明，解就是 <math>\mathbf{A}</math> 最小奇异值对应的右奇异向量。

===值域、零空间和秩===

−

SVD还能为矩阵 <math>\mathbf{M}</math> ~~的值域和零空间提供明确表示。~~<math>\mathbf{M}</math> 零奇异值对应的右奇异向量张成（span）其零空间，非零奇异值对应的左奇异向量张成其值域。例如，在前面的例子中，零空间由 <math>\mathbf{V}^*</math> 的最后一行张成，值域由 <math>\mathbf{U}</math> 的前三列张成。

+

SVD还能为矩阵 <math>\mathbf{M}</math> 的[[值域]]和[[零空间]]提供明确表示。<math>\mathbf{M}</math> 零奇异值对应的右奇异向量张成（span）其零空间，非零奇异值对应的左奇异向量张成其值域。例如，在前面的例子中，零空间由 <math>\mathbf{V}^*</math> 的最后一行张成，值域由 <math>\mathbf{U}</math> 的前三列张成。

−

因此，<math>\mathbf{M}</math> ~~的秩等于非零奇异值的个数，也就是~~ <math>\mathbf{\Sigma}</math> 中非零对角元素的个数。在数值线性代数中，我们可以用奇异值确定矩阵的有效秩，因为舍入误差（rounding error）可能导致秩亏矩阵（rank deficiency matrix）出现小但非零的奇异值。我们通常认为超过显著间隙的奇异值在数值上等同于零。

+

因此，<math>\mathbf{M}</math> 的[[秩]]等于非零奇异值的个数，也就是 <math>\mathbf{\Sigma}</math> 中非零对角元素的个数。在数值线性代数中，我们可以用奇异值确定矩阵的有效秩，因为舍入误差（rounding error）可能导致秩亏矩阵（rank deficiency matrix）出现小但非零的奇异值。我们通常认为超过显著间隙的奇异值在数值上等同于零。

===低秩矩阵近似===

−

一些实际应用需要用另一个特定秩 r 的矩阵 <math>\tilde{\mathbf{M}}</math>~~（称为截断矩阵truncated）来近似矩阵~~ <math>\mathbf{M}</math>。如果我们要在 <math>\operatorname{rank}(\tilde{\mathbf{M}}) = r</math> 的约束下最小化 <math>\mathbf{M}</math> 和 <math>\tilde{\mathbf{M}}</math> 之间差的[[Frobenius范数]]，那么解就由 <math>\mathbf{M}</math> 的SVD给出：

+

一些实际应用需要用另一个特定秩 r 的矩阵 <math>\tilde{\mathbf{M}}</math>（称为[[截断矩阵]]truncated）来近似矩阵 <math>\mathbf{M}</math>。如果我们要在 <math>\operatorname{rank}(\tilde{\mathbf{M}}) = r</math> 的约束下最小化 <math>\mathbf{M}</math> 和 <math>\tilde{\mathbf{M}}</math> 之间差的[[Frobenius范数]]，那么解就由 <math>\mathbf{M}</math> 的SVD给出：

<math>\tilde{\mathbf{M}} = \mathbf{U}\tilde{\mathbf{\Sigma}}\mathbf{V}^*</math>

−

这里，<math>\tilde{\mathbf{\Sigma}}</math> 与 <math>\mathbf{\Sigma}</math> 相同，但只保留 r ~~个最大的奇异值（其他奇异值置零）。这就是Eckart–Young定理，由这两位作者于1936年证明（尽管后来发现早期学者已经了解这一结果）~~<ref>{{cite journal |last=Stewart |first=G. W. |title=On the Early History of the Singular Value Decomposition |journal=SIAM Review |volume=35 |issue=4 |pages=551–566 |year=1993 |doi=10.1137/1035134 |hdl=1903/566 |jstor=2132388 |citeseerx=10.1.1.23.1831}}</ref>。

+

这里，<math>\tilde{\mathbf{\Sigma}}</math> 与 <math>\mathbf{\Sigma}</math> 相同，但只保留 r 个最大的奇异值（其他奇异值置零）。这就是[[Eckart–Young定理]]，由这两位作者于1936年证明（尽管后来发现早期学者已经了解这一结果）<ref>{{cite journal |last=Stewart |first=G. W. |title=On the Early History of the Singular Value Decomposition |journal=SIAM Review |volume=35 |issue=4 |pages=551–566 |year=1993 |doi=10.1137/1035134 |hdl=1903/566 |jstor=2132388 |citeseerx=10.1.1.23.1831}}</ref>。

===可分离模型===

−

我们可以将SVD视为把矩阵分解成加权、有序的可分离矩阵之和。所谓可分离，指的是矩阵 <math>\mathbf{A}</math> ~~可以表示为两个向量的外积（outer~~ product）<math>\mathbf{A} = \mathbf{u} \otimes \mathbf{v}</math>，用坐标表示即 <math>A_{ij} = u_i v_j</math>。具体来说，矩阵 <math>\mathbf{M}</math> 的分解如下：

+

我们可以将SVD视为把矩阵分解成加权、有序的可分离矩阵之和。所谓可分离，指的是矩阵 <math>\mathbf{A}</math> 可以表示为两个向量的[[外积]]（outer product）<math>\mathbf{A} = \mathbf{u} \otimes \mathbf{v}</math>，用坐标表示即 <math>A_{ij} = u_i v_j</math>。具体来说，矩阵 <math>\mathbf{M}</math> 的分解如下：

<math>\mathbf{M} = \sum_i \mathbf{A}_i = \sum_i \sigma_i \mathbf{U}_i \otimes \mathbf{V}_i</math>

第255行：第255行：

这里，<math>\mathbf{U}_i</math> 和 <math>\mathbf{V}_i</math> 分别是SVD矩阵的第i列，<math>\sigma_i</math> 是有序奇异值，每个 <math>\mathbf{A}_i</math> 都是可分离的。在图像处理中，我们常用SVD将滤波器分解为水平和垂直的可分离滤波器。值得注意的是，非零 <math>\sigma_i</math> 的数量恰好等于矩阵的秩。

−

可分离模型在生物系统中很常见，SVD分解在分析这些系统时非常有用。例如，我们可以用空间域的Gabor滤波器乘以时间域的调制函数来很好地描述一些视觉区V1简单细胞的感受野<ref>{{citation | last1=DeAngelis | first1=G. C. | last2 = Ohzawa | first2 = I.| last3 = Freeman | first3 = R. D. | title="Receptive-field dynamics in the central visual pathways" | journal=Trends Neurosci | date = October 1995 | volume=18 | issue=10| pages=451–8. | doi=10.1016/0166-2236(95)94496-R| pmid=8545912| s2cid=12827601}}</ref>~~。因此，如果我们通过反向相关（reverse~~ correlation）等方法评估得到线性滤波器，就可以将两个空间维度重排为一个维度，得到一个二维滤波器（空间，时间），然后进行SVD分解。在SVD分解中，<math>\mathbf{U}</math> 的第一列就是Gabor，而 <math>\mathbf{V}</math> 的第一列代表时间调制（或反之）。

+

可分离模型在生物系统中很常见，SVD分解在分析这些系统时非常有用。例如，我们可以用空间域的[[Gabor滤波器]]乘以时间域的调制函数来很好地描述一些视觉区V1简单细胞的感受野<ref>{{citation | last1=DeAngelis | first1=G. C. | last2 = Ohzawa | first2 = I.| last3 = Freeman | first3 = R. D. | title="Receptive-field dynamics in the central visual pathways" | journal=Trends Neurosci | date = October 1995 | volume=18 | issue=10| pages=451–8. | doi=10.1016/0166-2236(95)94496-R| pmid=8545912| s2cid=12827601}}</ref>。因此，如果我们通过[[反向相关]]（reverse correlation）等方法评估得到线性滤波器，就可以将两个空间维度重排为一个维度，得到一个二维滤波器（空间，时间），然后进行SVD分解。在SVD分解中，<math>\mathbf{U}</math> 的第一列就是Gabor，而 <math>\mathbf{V}</math> 的第一列代表时间调制（或反之）。

我们可以定义一个可分离性指数：

第265行：第265行：

===最近正交矩阵===

−

我们可以利用方阵 <math>\mathbf{A}</math> 的SVD来找出最接近 <math>\mathbf{A}</math> ~~的正交矩阵~~ <math>\mathbf{O}</math>。这里，我们用 <math>\mathbf{O} - \mathbf{A}</math> ~~的Frobenius范数来衡量接近程度。解是~~ <math>\mathbf{U}\mathbf{V}^*</math> 的乘积<ref>{{citation| url=https://people.wou.edu/~beavers/Talks/Willamette1106.pdf| title=The Singular Value Decomposition in Symmetric (Lowdin) Orthogonalization and Data Compression}}</ref>。这个结果在直觉上是合理的，因为正交矩阵会有分解 <math>\mathbf{U}\mathbf{I}\mathbf{V}^*</math>，其中 <math>\mathbf{I}</math> 是单位矩阵。所以如果 <math>\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，那么乘积 <math>\mathbf{A} = \mathbf{U}\mathbf{V}^*</math> 相当于用1替换所有奇异值。等价地，解就是极分解 <math>\mathbf{M} = \mathbf{R}\mathbf{P} = \mathbf{P}'\mathbf{R}</math> 中的酉矩阵 <math>\mathbf{R} = \mathbf{U}\mathbf{V}^*</math>，无论拉伸和旋转的顺序如何。

+

我们可以利用方阵 <math>\mathbf{A}</math> 的SVD来找出最接近 <math>\mathbf{A}</math> 的[[正交矩阵]] <math>\mathbf{O}</math>。这里，我们用 <math>\mathbf{O} - \mathbf{A}</math> 的[[Frobenius]]范数来衡量接近程度。解是 <math>\mathbf{U}\mathbf{V}^*</math> 的乘积<ref>{{citation| url=https://people.wou.edu/~beavers/Talks/Willamette1106.pdf| title=The Singular Value Decomposition in Symmetric (Lowdin) Orthogonalization and Data Compression}}</ref>。这个结果在直觉上是合理的，因为正交矩阵会有分解 <math>\mathbf{U}\mathbf{I}\mathbf{V}^*</math>，其中 <math>\mathbf{I}</math> 是单位矩阵。所以如果 <math>\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math>，那么乘积 <math>\mathbf{A} = \mathbf{U}\mathbf{V}^*</math> 相当于用1替换所有奇异值。等价地，解就是极分解 <math>\mathbf{M} = \mathbf{R}\mathbf{P} = \mathbf{P}'\mathbf{R}</math> 中的酉矩阵 <math>\mathbf{R} = \mathbf{U}\mathbf{V}^*</math>，无论拉伸和旋转的顺序如何。

−

~~在形状分析中，有一个类似的问题叫做正交普鲁克问题（orthogonal~~ Procrustes problem），它涉及找到一个最接近将 <math>\mathbf{A}</math> 映射到 <math>\mathbf{B}</math> 的正交矩阵 <math>\mathbf{O}</math>。具体来说：

+

在[[形状分析]]中，有一个类似的问题叫做[[正交普鲁克问题]]（orthogonal Procrustes problem），它涉及找到一个最接近将 <math>\mathbf{A}</math> 映射到 <math>\mathbf{B}</math> 的正交矩阵 <math>\mathbf{O}</math>。具体来说：

<math>\mathbf{O} = \underset{\Omega}{\operatorname{argmin}} \|\mathbf{A}\mathbf{\Omega} - \mathbf{B}\|_F \quad \text{subject to} \quad \mathbf{\Omega}^T\mathbf{\Omega} = \mathbf{I}</math>

第277行：第277行：

===Kabsch算法===

−

Kabsch算法（在其他领域称为Wahba问题）运用SVD来计算最优旋转（关于最小二乘最小化），以将一组点与相应的另一组点对齐。这种算法在多个领域都有应用，比如用于比较分子结构。

+

[[Kabsch算法]]（在其他领域称为[[Wahba问题]]）运用SVD来计算最优旋转（关于最小二乘最小化），以将一组点与相应的另一组点对齐。这种算法在多个领域都有应用，比如用于比较分子结构。

===信号处理===

−

~~研究者已成功将SVD和伪逆应用于信号处理~~ <ref>{{citation | last1=Sahidullah|first1= Md.| last2= Kinnunen| last3= Tomi| date=March 2016| title= "Local spectral variability features for speaker verification"| journal= Digital Signal Processing| volume= 50| pages= 1–11| doi=10.1016/j.dsp.2015.10.011}}</ref>、[[图像处理]] <ref>{{citation | last1=Mademlis| first1= Ioannis| last2=Tefas| first2=Anastasios| last3= Pitas| first3= Ioannis| date= 2018| title= "Regularized SVD-Based Video Frame Saliency for Unsupervised Activity Video Summarization"| url= https://ieeexplore.ieee.org/document/8462274| publisher= 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)| publisher=IEEE| pages= 2691–2695| doi=10.1109/ICASSP.2018.8462274| ISBN=978-1-5386-4658-8| S2CID= 52286352| access-date=2023-01-19}}</ref>和[[大数据]]（如[[基因组信号处理]]）<ref>{{cite journal |last1=Alter |first1=O. |last2=Brown |first2=P. O. |last3=Botstein |first3=D. |title=Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC27718/|journal=PNAS |volume=97 |issue=18 |pages=10101–10106 |date=September 2000 |doi=10.1073/pnas.97.18.10101 |pmid=10963673 |pmc=27718 |bibcode=2000PNAS...9710101A }}</ref><ref>{{cite journal |last1=Alter |first1=O. |last2=Golub |first2=G. H. |title=Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation Between DNA Replication and RNA Transcription |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC534520/|journal=PNAS |volume=101 |issue=47 |pages=16577–16582 |date=November 2004 |doi=10.1073/pnas.0406767101 |pmid=15545604 |pmc=534520 |bibcode=2004PNAS..10116577A }}</ref><ref>{{cite journal |last1=Alter |first1=O. |last2=Golub |first2=G. H. |title=Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524674/|journal=PNAS |volume=103 |issue=32 |pages=11828–11833 |date=August 2006 |doi=10.1073/pnas.0604756103 |pmid=16877539 |pmc=1524674 |bibcode=2006PNAS..10311828A }}</ref><ref>{{cite journal |last1=Bertagnolli |first1=N. M. |last2=Drake |first2=J. A. |last3=Tennessen |first3=J. M. |last4=Alter |first4=O. |title=SVD Identifies Transcript Length Distribution Functions from DNA Microarray Data and Reveals Evolutionary Forces Globally Affecting GBM Metabolism |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839928/|journal=PLOS ONE |volume=8 |issue=11 |pages=e78913 |date=November 2013 |doi=10.1371/journal.pone.0078913 |pmid=24282503 |pmc=3839928 |bibcode=2013PLoSO...878913B}}[https://www.alterlab.org/research/highlights/pone.0078913_Highlight.pdf Highlight]}</ref>

+

研究者已成功将SVD和伪逆应用于[[信号处理]] <ref>{{citation | last1=Sahidullah|first1= Md.| last2= Kinnunen| last3= Tomi| date=March 2016| title= "Local spectral variability features for speaker verification"| journal= Digital Signal Processing| volume= 50| pages= 1–11| doi=10.1016/j.dsp.2015.10.011}}</ref>、[[图像处理]] <ref>{{citation | last1=Mademlis| first1= Ioannis| last2=Tefas| first2=Anastasios| last3= Pitas| first3= Ioannis| date= 2018| title= "Regularized SVD-Based Video Frame Saliency for Unsupervised Activity Video Summarization"| url= https://ieeexplore.ieee.org/document/8462274| publisher= 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)| publisher=IEEE| pages= 2691–2695| doi=10.1109/ICASSP.2018.8462274| ISBN=978-1-5386-4658-8| S2CID= 52286352| access-date=2023-01-19}}</ref>和[[大数据]]（如[[基因组信号处理]]）<ref>{{cite journal |last1=Alter |first1=O. |last2=Brown |first2=P. O. |last3=Botstein |first3=D. |title=Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC27718/|journal=PNAS |volume=97 |issue=18 |pages=10101–10106 |date=September 2000 |doi=10.1073/pnas.97.18.10101 |pmid=10963673 |pmc=27718 |bibcode=2000PNAS...9710101A }}</ref><ref>{{cite journal |last1=Alter |first1=O. |last2=Golub |first2=G. H. |title=Integrative Analysis of Genome-Scale Data by Using Pseudoinverse Projection Predicts Novel Correlation Between DNA Replication and RNA Transcription |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC534520/|journal=PNAS |volume=101 |issue=47 |pages=16577–16582 |date=November 2004 |doi=10.1073/pnas.0406767101 |pmid=15545604 |pmc=534520 |bibcode=2004PNAS..10116577A }}</ref><ref>{{cite journal |last1=Alter |first1=O. |last2=Golub |first2=G. H. |title=Singular Value Decomposition of Genome-Scale mRNA Lengths Distribution Reveals Asymmetry in RNA Gel Electrophoresis Band Broadening |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1524674/|journal=PNAS |volume=103 |issue=32 |pages=11828–11833 |date=August 2006 |doi=10.1073/pnas.0604756103 |pmid=16877539 |pmc=1524674 |bibcode=2006PNAS..10311828A }}</ref><ref>{{cite journal |last1=Bertagnolli |first1=N. M. |last2=Drake |first2=J. A. |last3=Tennessen |first3=J. M. |last4=Alter |first4=O. |title=SVD Identifies Transcript Length Distribution Functions from DNA Microarray Data and Reveals Evolutionary Forces Globally Affecting GBM Metabolism |url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839928/|journal=PLOS ONE |volume=8 |issue=11 |pages=e78913 |date=November 2013 |doi=10.1371/journal.pone.0078913 |pmid=24282503 |pmc=3839928 |bibcode=2013PLoSO...878913B}}[https://www.alterlab.org/research/highlights/pone.0078913_Highlight.pdf Highlight]}</ref>

。

===其他例子===

−

奇异值分解(SVD)~~在线性反问题（inverse problems）研究中广泛应用，分析Tikhonov正则化等方法时颇有助益。统计学界普遍使用它，与~~[[主成分分析]]（principal component analysis）和对应分析（correspondence analysis）密切相关，信号处理和模式识别领域也常见其身影。此外，它还用于仅输出模态分析（modal analysis），可从奇异向量确定非缩放模态形状（mode shapes）。自然语言文本处理中的潜在语义索引（latent semantic indexing）也离不开它。

+

奇异值分解(SVD)在线性[[反问题]]（inverse problems）研究中广泛应用，分析[[Tikhonov正则化]]等方法时颇有助益。统计学界普遍使用它，与[[主成分分析]]（principal component analysis）和[[对应分析]]（correspondence analysis）密切相关，[[信号处理]]和[[模式识别]]领域也常见其身影。此外，它还用于仅输出[[模态分析]]（modal analysis），可从奇异向量确定非缩放[[模态形状]]（mode shapes）。自然语言文本处理中的[[潜在语义索引]]（latent semantic indexing）也离不开它。

在涉及线性或线性化系统的一般数值计算中，常用一个普遍常数来刻画问题的规律性或奇异性，即系统的"条件数" <math>\kappa := \sigma_{\text{max}} / \sigma_{\text{min}}</math>。这个数值通常决定了给定计算方案在这些系统上的误差率或收敛速度<ref>{{citation | last1=Edelman | first1=Alan | date=1992 |url=https://math.mit.edu/~edelman/publications/distribution_of_a_scaled.pdf| title="On the distribution of a scaled condition number" | journal=Math. Comp. | volume=58 | issue=197 | pages=185–190 | doi=10.1090/S0025-5718-1992-1106966-2 | bibcode=1992MaCom..58..185E}}</ref><ref>{{citation | last1=Shen | first1=Jianhong (Jackie) | date=2001 |url=https://linkinghub.elsevier.com/retrieve/pii/S0024379500003220| title="On the singular values of Gaussian random matrices" | journal=Linear Alg. Appl. | volume=326 | issue=1–3 | pages=1–14 | doi=10.1016/S0024-3795(00)00322-0}}</ref>。

−

[[量子信息]]（quantum ~~information）领域中，SVD以Schmidt分解的形式发挥着关键作用。通过它，我们可以自然地分解两个量子系统的状态，从而提供了它们纠缠的充要条件：只要~~ <math>\mathbf{\Sigma}</math> 矩阵的秩大于1。

+

[[量子信息]]（quantum information）领域中，SVD以[[Schmidt分解]]的形式发挥着关键作用。通过它，我们可以自然地分解两个量子系统的状态，从而提供了它们[[纠缠]]的充要条件：只要 <math>\mathbf{\Sigma}</math> 矩阵的秩大于1。

−

~~数值天气预报（numerical~~ weather prediction）中，SVD对大型矩阵也有重要应用。利用Lanczos方法，可以估算在给定初始前向时间段内，对中心数值天气预报线性增长最快的几个扰动。这些扰动实际上是该时间间隔内全球天气线性化传播子对应最大奇异值的奇异向量。在这种情况下，输出奇异向量代表整个天气系统。随后，这些扰动通过完整的非线性模型运行，生成集合预报（ensemble forecast），为当前中心预测周围的不确定性提供了处理方法。

+

[[数值天气预报]]（numerical weather prediction）中，SVD对大型矩阵也有重要应用。利用[[Lanczos方法]]，可以估算在给定初始前向时间段内，对中心数值天气预报线性增长最快的几个扰动。这些扰动实际上是该时间间隔内全球天气线性化传播子对应最大奇异值的奇异向量。在这种情况下，输出奇异向量代表整个天气系统。随后，这些扰动通过完整的非线性模型运行，生成[[集合预报]]（ensemble forecast），为当前中心预测周围的不确定性提供了处理方法。

−

降阶建模（reduced-order ~~modeling）中也少不了SVD的身影。降阶建模旨在减少复杂系统中的自由度数量。研究人员将SVD与径向基函数（radial~~ basis functions）结合，用于插值三维非稳态流问题的解<ref>{{citation | last1=Walton | first1=S. | last2=Hassan | first2=O. | last3=Morgan | first3=K. | date=2013 |url=https://linkinghub.elsevier.com/retrieve/pii/S0307904X13002771| title="Reduced order modelling for unsteady fluid flow using proper orthogonal decomposition and radial basis functions" | journal=Applied Mathematical Modelling | volume=37 | issue=20–21 | pages=8930–8945 | doi=10.1016/j.apm.2013.04.025}}</ref>。

+

降阶建模（reduced-order modeling）中也少不了SVD的身影。降阶建模旨在减少复杂系统中的自由度数量。研究人员将SVD与[[径向基函数]]（radial basis functions）结合，用于插值三维非稳态流问题的解<ref>{{citation | last1=Walton | first1=S. | last2=Hassan | first2=O. | last3=Morgan | first3=K. | date=2013 |url=https://linkinghub.elsevier.com/retrieve/pii/S0307904X13002771| title="Reduced order modelling for unsteady fluid flow using proper orthogonal decomposition and radial basis functions" | journal=Applied Mathematical Modelling | volume=37 | issue=20–21 | pages=8930–8945 | doi=10.1016/j.apm.2013.04.025}}</ref>。

值得一提的是，科学家们已经利用SVD改进了地面引力波干涉仪aLIGO的引力波形建模（gravitational wave modeling）<ref>{{citation | last1=Setyawati | first1=Y. | last2=Ohme | first2=F. | last3=Khan | first3=S. | date=2019 | title="Enhancing gravitational waveform model through dynamic calibration" | journal=Physical Review D | volume=99 | issue=2 | pages=024010 | doi=10.1103/PhysRevD.99.024010 | arxiv=1810.07060 | bibcode=2019PhRvD..99b4010S | s2cid=118935941}}</ref>。SVD有助于提高波形生成的准确性和速度，支持引力波搜索和更新两种不同的波形模型。

[[推荐系统]]（Recommender systems）中，SVD用于预测用户对项目的评分<ref>{{citation | last1=Sarwar | first1=Badrul | last2=Karypis | first2=George | last3=Konstan | first3=Joseph A. | last4=Riedl | first4=John T. | date=2000 |url=https://files.grouplens.org/papers/webKDD00.pdf| title="Application of Dimensionality Reduction in Recommender System – A Case Study" | publisher=University of Minnesota}}</ref>。为了在商品机器集群上高效计算SVD，研究人员开发了分布式算法<ref>{{citation | last1=Bosagh Zadeh | first1=Reza | last2=Carlsson | first2=Gunnar | date=2013 |url=https://stanford.edu/~rezab/papers/dimsum.pdf| title="Dimension Independent Matrix Square Using MapReduce" | arxiv=1304.1467 | bibcode=2013arXiv1304.1467B}}</ref>。

−

低秩SVD在从时空数据中检测热点方面表现出色，已应用于疾病爆发检测<ref>{{citation | last1=Fanaee Tork | first1=Hadi | last2=Gama | first2=João | date=September 2014 | title="Eigenspace method for spatiotemporal hotspot detection" | journal=Expert Systems | volume=32 | issue=3 | pages=454–464 | doi=10.1111/exsy.12088 | arxiv=1406.3506 | bibcode=2014arXiv1406.3506F|s2cid=15476557}}</ref>。研究人员还将SVD和高阶SVD结合起来，用于疾病监测中从复杂数据流(具有空间和时间维度的多变量数据)进行实时事件检测<ref>{{citation | last1=Fanaee Tork | first1=Hadi | last2=Gama | first2=João | date=May 2015 | title="EigenEvent: An Algorithm for Event Detection from Complex Data Streams in Syndromic Surveillance" | journal=Intelligent Data Analysis | volume=19 | issue=3 | pages=597–616 | doi=10.3233/IDA-150734 | arxiv=1406.3496|s2cid=17966555}}</ref>。

−

~~在天体动力学（astrodynamics）领域，科学家们运用SVD及其变体来确定转移轨道设计~~<ref>{{citation | last1=Muralidharan | first1=Vivek | last2=Howell | first2=Kathleen | date=2023 | title="Stretching directions in cislunar space: Applications for departures and transfer design" | journal=Astrodynamics | volume=7 | issue=2 | pages=153–178 | doi=10.1007/s42064-022-0147-z |bibcode=2023AsDyn...7..153M}}</ref>和轨道站保持的合适机动方向<ref>{{citation | last1=Muralidharan | first1=Vivek | last2=Howell | first2=Kathleen | date=2022 | title="Leveraging stretching directions for stationkeeping in Earth-Moon halo orbits" | journal=Advances in Space Research | volume=69 | issue=1 | pages=620–646 | doi=10.1016/j.asr.2021.10.028 | bibcode=2022AdSpR..69..620M|s2cid=239490016}}</ref>。

+

低秩SVD在从时空数据中检测热点方面表现出色，已应用于疾病爆发检测<ref>{{citation | last1=Fanaee Tork | first1=Hadi | last2=Gama | first2=João | date=September 2014 | title="Eigenspace method for spatiotemporal hotspot detection" | journal=Expert Systems | volume=32 | issue=3 | pages=454–464 | doi=10.1111/exsy.12088 | arxiv=1406.3506 | bibcode=2014arXiv1406.3506F|s2cid=15476557}}</ref>。研究人员还将SVD和[[高阶SVD]]结合起来，用于疾病监测中从复杂数据流(具有空间和时间维度的多变量数据)进行实时事件检测<ref>{{citation | last1=Fanaee Tork | first1=Hadi | last2=Gama | first2=João | date=May 2015 | title="EigenEvent: An Algorithm for Event Detection from Complex Data Streams in Syndromic Surveillance" | journal=Intelligent Data Analysis | volume=19 | issue=3 | pages=597–616 | doi=10.3233/IDA-150734 | arxiv=1406.3496|s2cid=17966555}}</ref>。

+

在[[天体动力学]]（astrodynamics）领域，科学家们运用SVD及其变体来确定转移轨道设计<ref>{{citation | last1=Muralidharan | first1=Vivek | last2=Howell | first2=Kathleen | date=2023 | title="Stretching directions in cislunar space: Applications for departures and transfer design" | journal=Astrodynamics | volume=7 | issue=2 | pages=153–178 | doi=10.1007/s42064-022-0147-z |bibcode=2023AsDyn...7..153M}}</ref>和轨道站保持的合适机动方向<ref>{{citation | last1=Muralidharan | first1=Vivek | last2=Howell | first2=Kathleen | date=2022 | title="Leveraging stretching directions for stationkeeping in Earth-Moon halo orbits" | journal=Advances in Space Research | volume=69 | issue=1 | pages=620–646 | doi=10.1016/j.asr.2021.10.028 | bibcode=2022AdSpR..69..620M|s2cid=239490016}}</ref>。

==存在性证明==

−

矩阵 <math>\mathbf{M}</math> 的特征值 <math>\lambda</math> 满足代数关系 <math>\mathbf{M}\mathbf{u} = \lambda\mathbf{u}</math>。当 <math>\mathbf{M}</math> ~~是厄米特矩阵（Hermitian~~ matrix）时，我们还可以用变分特征来描述它。假设 <math>\mathbf{M}</math> 是一个 <math>n\times n</math> ~~的实数对称矩阵，我们定义函数：~~

+

矩阵 <math>\mathbf{M}</math> 的特征值 <math>\lambda</math> 满足代数关系 <math>\mathbf{M}\mathbf{u} = \lambda\mathbf{u}</math>。当 <math>\mathbf{M}</math> 是[[厄米特矩阵]]（Hermitian matrix）时，我们还可以用变分特征来描述它。假设 <math>\mathbf{M}</math> 是一个 <math>n\times n</math> 的实数[[对称矩阵]]，我们定义函数：

<math>

第315行：第316行：

</math>

−

~~极值定理告诉我们，当这个连续函数限制在单位球面~~ <math>\{\|\mathbf{x}\|=1\}</math> 上时，必在某点 <math>\mathbf{u}</math> ~~处达到最大值。根据拉格朗日乘数定理，~~<math>\mathbf{u}</math> 必然满足：

+

[[极值定理]]告诉我们，当这个连续函数限制在单位球面 <math>\{\|\mathbf{x}\|=1\}</math> 上时，必在某点 <math>\mathbf{u}</math> 处达到最大值。根据[[拉格朗日乘数]]定理，<math>\mathbf{u}</math> 必然满足：

<math>\nabla \mathbf{u}^{\operatorname{T}}\mathbf{M}\mathbf{u} - \lambda \cdot \nabla \mathbf{u}^{\operatorname{T}}\mathbf{u} = 0</math>

−

这里 <math>\lambda</math> 是某个实数，而 <math>\nabla</math> 表示相对于 <math>\mathbf{x}</math> ~~的微分算子。利用~~ <math>\mathbf{M}</math> 的对称性，我们得到：

+

这里 <math>\lambda</math> 是某个实数，而 <math>\nabla</math> 表示相对于 <math>\mathbf{x}</math> 的[[del]]微分算子。利用 <math>\mathbf{M}</math> 的对称性，我们得到：

<math>\nabla \mathbf{x}^{\operatorname{T}}\mathbf{M}\mathbf{x} - \lambda \cdot \nabla \mathbf{x}^{\operatorname{T}}\mathbf{x} = 2(\mathbf{M} - \lambda \mathbf{I})\mathbf{x}</math>

第333行：第334行：

===基于谱定理===

−

设 <math>\mathbf{M}</math> 为 <math>m\times n</math> 复矩阵。由于 <math>\mathbf{M}^*\mathbf{M}</math> ~~是正半定厄米特矩阵，根据谱定理，我们可以找到一个~~ <math>n\times n</math> 酉矩阵 <math>\mathbf{V}</math>，使得：

+

设 <math>\mathbf{M}</math> 为 <math>m\times n</math> 复矩阵。由于 <math>\mathbf{M}^*\mathbf{M}</math> 是正半定厄米特矩阵，根据[[谱定理]]，我们可以找到一个 <math>n\times n</math> 酉矩阵 <math>\mathbf{V}</math>，使得：

<math>\mathbf{V}^*\mathbf{M}^*\mathbf{M}\mathbf{V} = \bar{\mathbf{D}} = \begin{bmatrix}\mathbf{D} & 0 \\ 0 & 0\end{bmatrix},</math>

第415行：第416行：

设 <math>\mathbf{M}</math> 为 <math>m\times n</math> 实矩阵。定义 <math>S^{k-1}</math> 为 <math>\mathbb{R}^{k}</math> 中的单位 <math>(k-1)</math>-球面，并令 <math>\sigma(\mathbf{u},\mathbf{v})=\mathbf{u}^{\operatorname{T}}\mathbf{M}\mathbf{v}</math>，其中 <math>\mathbf{u} \in S^{m-1}</math>， <math>\mathbf{v} \in S^{n-1}</math>。

−

考虑函数 <math>\sigma</math> 在 <math>S^{m-1}\times S^{n-1}</math> 上的限制。由于 <math>S^{m-1}</math> 和 <math>S^{n-1}</math> ~~都是紧集（compact~~ sets），它们的乘积也是紧集。又因 <math>\sigma</math> 连续，它必在 <math>S^{m-1}</math> 中的某向量 <math>\mathbf{u}</math> 和 <math>S^{n-1}</math> 中的某向量 <math>\mathbf{v}</math> 处达到最大值。我们将这个最大值记为 <math>\sigma_1</math>，相应的向量记为 <math>\mathbf{u}_1</math> 和 <math>\mathbf{v}_1</math>。由于 <math>\sigma_1</math> 是 <math>\sigma(\mathbf{u},\mathbf{v})</math> 的最大值，它必为非负。若为负，改变 <math>\mathbf{u}_1</math> 或 <math>\mathbf{v}_1</math> 的符号就能使其变为正值，从而更大。

+

考虑函数 <math>\sigma</math> 在 <math>S^{m-1}\times S^{n-1}</math> 上的限制。由于 <math>S^{m-1}</math> 和 <math>S^{n-1}</math> 都是[[紧集]]（compact sets），它们的乘积也是紧集。又因 <math>\sigma</math> 连续，它必在 <math>S^{m-1}</math> 中的某向量 <math>\mathbf{u}</math> 和 <math>S^{n-1}</math> 中的某向量 <math>\mathbf{v}</math> 处达到最大值。我们将这个最大值记为 <math>\sigma_1</math>，相应的向量记为 <math>\mathbf{u}_1</math> 和 <math>\mathbf{v}_1</math>。由于 <math>\sigma_1</math> 是 <math>\sigma(\mathbf{u},\mathbf{v})</math> 的最大值，它必为非负。若为负，改变 <math>\mathbf{u}_1</math> 或 <math>\mathbf{v}_1</math> 的符号就能使其变为正值，从而更大。

现在我们提出以下陈述: <math>\mathbf{u}_1</math> 和 <math>\mathbf{v}_1</math> 是 <math>\mathbf{M}</math> 的左右奇异向量，对应的奇异值为 <math>\sigma_1</math>。

第451行：第452行：

===单边雅可比算法（One-sided Jacobi algorithm）===

−

单边雅可比算法是一种迭代算法<ref>{{cite journal |last1=Rijk |first1=P.P.M. de |title=A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer |journal=SIAM J. Sci. Stat. Comput. |volume=10 |pages=359 |year=1989 }}</ref>~~，它通过反复转换使矩阵列正交化。其基本迭代步骤由雅可比旋转（Jacobi~~ rotation）给出：

+

单边雅可比算法是一种迭代算法<ref>{{cite journal |last1=Rijk |first1=P.P.M. de |title=A one-sided Jacobi algorithm for computing the singular value decomposition on a vector computer |journal=SIAM J. Sci. Stat. Comput. |volume=10 |pages=359 |year=1989 }}</ref>，它通过反复转换使矩阵列正交化。其基本迭代步骤由[[雅可比旋转]]（Jacobi rotation）给出：

<math> M \leftarrow MJ(p,q,\theta), </math>

第457行：第458行：

这里，我们选择雅可比旋转矩阵<math>J(p,q,\theta)</math>的角度<math>\theta</math>，使旋转后第<math>p</math>列和第<math>q</math>列正交。指标<math>(p,q)</math>按<math>(p=1\dots m,q=p+1\dots m)</math>循环扫描，其中<math>m</math>为列数。

−

算法收敛后，我们可以这样恢复奇异值分解<math>M=USV^{T}</math>：矩阵<math>V</math>是雅可比旋转矩阵的累积，将转换后矩阵<math>M</math>~~的列归一化得到矩阵~~<math>U</math>，而奇异值则由转换后矩阵<math>M</math>列的范数给出。

+

算法收敛后，我们可以这样恢复奇异值分解<math>M=USV^{T}</math>：矩阵<math>V</math>是雅可比旋转矩阵的累积，将转换后矩阵<math>M</math>的列[[归一化]]得到矩阵<math>U</math>，而奇异值则由转换后矩阵<math>M</math>列的范数给出。

===双边雅可比算法（Two-sided Jacobi algorithm）===

−

~~双边雅可比SVD算法是雅可比特征值算法（Jacobi~~ eigenvalue ~~algorithm）的推广，它通过迭代将方阵转换为对角矩阵。对于非方阵，我们先进行QR分解，然后对~~<math>R</math>~~矩阵应用此算法。其基本迭代步骤如下：先用Givens旋转使一对非对角元素对称，再用雅可比变换（Jacobi~~ transformation）使它们归零，

+

双边雅可比SVD算法是[[雅可比特征值算法]]（Jacobi eigenvalue algorithm）的推广，它通过迭代将方阵转换为对角矩阵。对于非方阵，我们先进行[[QR分解]]，然后对<math>R</math>矩阵应用此算法。其基本迭代步骤如下：先用[[Givens旋转]]使一对非对角元素对称，再用[[雅可比变换]]（Jacobi transformation）使它们归零，

<math> M \leftarrow J^{T}GMJ </math>

第473行：第474行：

我们可以利用以下观察结果计算奇异值分解：

−

* <math>\mathbf{M}</math>的左奇异向量是<math>\mathbf{M}\mathbf{M}^{*}</math>~~的一组正交归一化特征向量。~~

+

* <math>\mathbf{M}</math>的左奇异向量是<math>\mathbf{M}\mathbf{M}^{*}</math>的一组[[正交归一化]]特征向量。

* <math>\mathbf{M}</math>的右奇异向量是<math>\mathbf{M}^{*}\mathbf{M}</math>的一组正交归一化特征向量。

−

* <math>\mathbf{M}</math>的非零奇异值（在<math>\mathbf{\Sigma}</math>的对角线上）等于<math>\mathbf{M}^{*}\mathbf{M}</math>和<math>\mathbf{M}\mathbf{M}^{*}</math>~~的非零特征值的平方根。~~

+

* <math>\mathbf{M}</math>的非零奇异值（在<math>\mathbf{\Sigma}</math>的对角线上）等于<math>\mathbf{M}^{*}\mathbf{M}</math>和<math>\mathbf{M}\mathbf{M}^{*}</math>的非零[[特征值]]的平方根。

−

通常，我们通过两步计算矩阵<math>\mathbf{M}</math>~~的SVD。第一步将矩阵简化为双对角矩阵（bidiagonal~~ matrix），假设<math>m \geq n</math>，需要<math>O(mn^{2})</math>~~次浮点运算（flop）。第二步计算双对角矩阵的SVD，只能通过迭代方法完成。实际上，计算到机器精度就足够了。如果将这个精度视为常数，第二步需要~~<math>O(n)</math>次迭代，每次迭代花费<math>O(n)</math>次flop。因此，第一步耗时更长，总体成本为<math>O(mn^{2})</math>次flop<ref name = "source2">{{cite book |last1=Trefethen |first1=Lloyd N. |last2=Bau |first2=David III |title=Numerical Linear Algebra |publisher=Society for Industrial and Applied Mathematics |year=1997 |location=Philadelphia |isbn=978-0-89871-361-9}}</ref>。

+

通常，我们通过两步计算矩阵<math>\mathbf{M}</math>的SVD。第一步将矩阵简化为[[双对角矩阵]]（bidiagonal matrix），假设<math>m \geq n</math>，需要<math>O(mn^{2})</math>次浮点运算（flop）。第二步计算双对角矩阵的SVD，只能通过迭代方法完成。实际上，计算到[[机器精度]]就足够了。如果将这个精度视为常数，第二步需要<math>O(n)</math>次迭代，每次迭代花费<math>O(n)</math>次flop。因此，第一步耗时更长，总体成本为<math>O(mn^{2})</math>次flop<ref name = "source2">{{cite book |last1=Trefethen |first1=Lloyd N. |last2=Bau |first2=David III |title=Numerical Linear Algebra |publisher=Society for Industrial and Applied Mathematics |year=1997 |location=Philadelphia |isbn=978-0-89871-361-9}}</ref>。

−

~~如果只需计算奇异值，第一步可用Householder反射以~~<math>4mn^{2}-4n^{3}/3</math>次flop完成。当<math>m</math>远大于<math>n</math>~~时，先用QR分解将矩阵~~<math>\mathbf{M}</math>简化为三角矩阵，再用Householder反射进一步简化为双对角形式更有优势；总成本为<math>2mn^{2}+2n^{3}</math>次flop<ref name = "source2"/>。

+

如果只需计算奇异值，第一步可用[[Householder反射]]以<math>4mn^{2}-4n^{3}/3</math>次flop完成。当<math>m</math>远大于<math>n</math>时，先用[[QR分解]]将矩阵<math>\mathbf{M}</math>简化为三角矩阵，再用Householder反射进一步简化为双对角形式更有优势；总成本为<math>2mn^{2}+2n^{3}</math>次flop<ref name = "source2"/>。

−

~~第二步可用QR算法的变体完成，该变体由Golub~~ & Kahan<ref name = "source3">{{cite journal |last1=Golub |first1=Gene H. |last2=Kahan |first2=William |title=Calculating the singular values and pseudo-inverse of a matrix |journal=Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis |volume=2 |issue=2 |pages=205–224 |year=1965 |doi=10.1137/0702016 |jstor=2949777 |bibcode=1965SJNA....2..205G}}</ref>~~首次描述。LAPACK子程序DBDSQR~~<ref name = "source1">{{cite web |title=Netlib.org |url=http://www.netlib.org/ }}</ref>实现了这种迭代方法，并针对奇异值非常小的情况进行了改进<ref>{{cite journal |last1=Demmel |first1=James |last2=Kahan |first2=William |title=Accurate singular values of bidiagonal matrices |journal=SIAM Journal on Scientific and Statistical Computing |volume=11 |issue=5 |pages=873–912 |year=1990 |doi=10.1137/0911052 |citeseerx=10.1.1.48.3740}}</ref>。结合使用Householder反射的第一步和适当情况下的QR分解，构成了计算奇异值分解的DGESVD<ref name="source1"/>例程。

+

第二步可用[[QR算法]]的变体完成，该变体由Golub & Kahan<ref name = "source3">{{cite journal |last1=Golub |first1=Gene H. |last2=Kahan |first2=William |title=Calculating the singular values and pseudo-inverse of a matrix |journal=Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis |volume=2 |issue=2 |pages=205–224 |year=1965 |doi=10.1137/0702016 |jstor=2949777 |bibcode=1965SJNA....2..205G}}</ref>首次描述。[[LAPACK]]子程序DBDSQR<ref name = "source1">{{cite web |title=Netlib.org |url=http://www.netlib.org/ }}</ref>实现了这种迭代方法，并针对奇异值非常小的情况进行了改进<ref>{{cite journal |last1=Demmel |first1=James |last2=Kahan |first2=William |title=Accurate singular values of bidiagonal matrices |journal=SIAM Journal on Scientific and Statistical Computing |volume=11 |issue=5 |pages=873–912 |year=1990 |doi=10.1137/0911052 |citeseerx=10.1.1.48.3740}}</ref>。结合使用Householder反射的第一步和适当情况下的QR分解，构成了计算奇异值分解的DGESVD<ref name="source1"/>例程。

−

~~GNU科学库（GSL）也实现了相同算法，并提供了一种替代方法，在第2步中使用单边雅可比正交化~~<ref>{{cite web |url=https://www.gnu.org/software/gsl/doc/html/|title=§14.4 Singular Value Decomposition |author=GSL Team |year=2007 |work=GNU Scientific Library Reference Manual}}</ref>。这种方法通过求解一系列<math>2\times 2</math>的SVD问题来计算双对角矩阵的SVD，类似于雅可比特征值算法求解一系列<math>2\times 2</math>的特征值问题<ref>{{cite book |last1=Golub |first1=Gene H. |last2=Van Loan |first2=Charles F. |title=Matrix Computations |edition=3rd |publisher=Johns Hopkins |year=1996 |isbn=978-0-8018-5414-9}}</ref>~~。第2步的另一种方法借鉴了分治特征值算法（divide~~-and-conquer eigenvalue algorithms）的思想<ref name = "source2"/>。

+

[[GNU科学库]]（GSL）也实现了相同算法，并提供了一种替代方法，在第2步中使用单边雅可比正交化<ref>{{cite web |url=https://www.gnu.org/software/gsl/doc/html/|title=§14.4 Singular Value Decomposition |author=GSL Team |year=2007 |work=GNU Scientific Library Reference Manual}}</ref>。这种方法通过求解一系列<math>2\times 2</math>的SVD问题来计算双对角矩阵的SVD，类似于雅可比特征值算法求解一系列<math>2\times 2</math>的特征值问题<ref>{{cite book |last1=Golub |first1=Gene H. |last2=Van Loan |first2=Charles F. |title=Matrix Computations |edition=3rd |publisher=Johns Hopkins |year=1996 |isbn=978-0-8018-5414-9}}</ref>。第2步的另一种方法借鉴了[[分治特征值算法]]（divide-and-conquer eigenvalue algorithms）的思想<ref name = "source2"/>。

还有一种不显式使用特征值分解的替代方法<ref>{{cite web |title=Simple SVD |url=https://mathworks.co.kr/matlabcentral/fileexchange/12674-simple-svd |website=MATLAB Central File Exchange }}</ref>。通常，我们将矩阵<math>\mathbf{M}</math>的奇异值问题转换为等价的对称特征值问题，如<math>\mathbf{M}\mathbf{M}^{*}</math>，<math>\mathbf{M}^{*}\mathbf{M}</math>，或 <math> \begin{bmatrix}\mathbf{0} &\mathbf{M} \\\mathbf{M}^{*}&\mathbf{0} \end{bmatrix}. </math>

−

使用特征值分解的方法基于QR算法，该算法已发展得稳定且快速。注意，奇异值是实数，右奇异向量和左奇异向量不需要形成相似变换。我们可以在QR分解和LQ分解之间迭代交替以找到实对角Hermitian矩阵。QR分解给出<math>\mathbf{M} \Rightarrow \mathbf{Q}\mathbf{R}</math>，<math>\mathbf{R}</math>的LQ分解给出<math>\mathbf{R} \Rightarrow \mathbf{L}\mathbf{P}^{*}</math>。因此，每次迭代中，我们有<math>\mathbf{M} \Rightarrow \mathbf{Q}\mathbf{L}\mathbf{P}^{*}</math>，更新<math>\mathbf{M} \Leftarrow \mathbf{L}</math>并重复正交化。最终，QR分解和LQ分解之间的这种迭代产生左右酉奇异矩阵。这种方法不能像QR算法那样通过谱移位或缩减来加速，因为没有使用相似变换，移位方法不易定义。然而，这种迭代方法实现简单，当速度不重要时是个不错的选择。它还提供了纯正交/酉变换如何获得SVD的洞察。

+

使用特征值分解的方法基于[[QR算法]]，该算法已发展得稳定且快速。注意，奇异值是实数，右奇异向量和左奇异向量不需要形成相似变换。我们可以在[[QR分解]]和[[LQ分解]]之间迭代交替以找到实对角[[Hermitian矩阵]]。[[QR分解]]给出<math>\mathbf{M} \Rightarrow \mathbf{Q}\mathbf{R}</math>，<math>\mathbf{R}</math>的LQ分解给出<math>\mathbf{R} \Rightarrow \mathbf{L}\mathbf{P}^{*}</math>。因此，每次迭代中，我们有<math>\mathbf{M} \Rightarrow \mathbf{Q}\mathbf{L}\mathbf{P}^{*}</math>，更新<math>\mathbf{M} \Leftarrow \mathbf{L}</math>并重复正交化。最终，[[QR分解]]和[[LQ分解]]之间的这种迭代产生左右酉奇异矩阵。这种方法不能像QR算法那样通过谱移位或缩减来加速，因为没有使用相似变换，移位方法不易定义。然而，这种迭代方法实现简单，当速度不重要时是个不错的选择。它还提供了纯正交/酉变换如何获得SVD的洞察。

===2 × 2 SVD的解析结果===

第493行：第494行：

解析<math>2\times 2</math>，我们可以找到矩阵的奇异值。设矩阵为<math>\mathbf{M} =z_{0}\mathbf{I} +z_{1}\sigma_{1}+z_{2}\sigma_{2}+z_{3}\sigma_{3}</math>

−

其中<math>z_{i}\in \mathbb{C}</math>是参数化矩阵的复数，<math>\mathbf{I}</math>是单位矩阵，<math>\sigma_{i}</math>~~表示泡利矩阵（Pauli~~ matrices）。那么它的两个奇异值由以下给出：

+

其中<math>z_{i}\in \mathbb{C}</math>是参数化矩阵的复数，<math>\mathbf{I}</math>是单位矩阵，<math>\sigma_{i}</math>表示[[泡利矩阵]]（Pauli matrices）。那么它的两个奇异值由以下给出：

<math> \begin{aligned}

第541行：第542行：

这里矩阵<math>\mathbf{U}_t</math>为<math>m \times t</math>，<math>\mathbf{\Sigma}_t</math>为<math>t \times t</math>对角矩阵，<math>\mathbf{V}_t^*</math>为<math>t \times n</math>。我们只计算<math>\mathbf{U}</math>的t个列向量和<math>\mathbf{V}^*</math>的t个行向量，它们对应<math>\mathbf{\Sigma}_t</math>中最大的t个奇异值。当<math>t \ll r</math>时，这比紧凑SVD更快更省，但需要一套完全不同的数值求解器工具。

−

在需要近似矩阵<math>\mathbf{M}</math>~~的Moore–Penrose逆的应用中，~~<math>\mathbf{M}</math>的最小奇异值是有意义的，但与最大奇异值相比，它们更难计算。

+

在需要近似矩阵<math>\mathbf{M}</math>的[[Moore–Penrose逆]]的应用中，<math>\mathbf{M}</math>的最小奇异值是有意义的，但与最大奇异值相比，它们更难计算。

−

~~截断SVD在潜在语义索引中得到了应用~~<ref>{{cite journal |last1=Chicco |first1=D |last2=Masseroli |first2=M |title=Software suite for gene and protein annotation prediction and similarity search |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |url=https://ieeexplore.ieee.org/document/6987347|volume=12 |issue=4 |pages=837–843 |year=2015 |doi=10.1109/TCBB.2014.2382127 |pmid=26357324 |s2cid=14714823 |hdl=11311/959408 }}</ref>

+

截断SVD在[[潜在语义索引]]中得到了应用<ref>{{cite journal |last1=Chicco |first1=D |last2=Masseroli |first2=M |title=Software suite for gene and protein annotation prediction and similarity search |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |url=https://ieeexplore.ieee.org/document/6987347|volume=12 |issue=4 |pages=837–843 |year=2015 |doi=10.1109/TCBB.2014.2382127 |pmid=26357324 |s2cid=14714823 |hdl=11311/959408 }}</ref>

。

第550行：第551行：

===Ky Fan范数===

−

我们把<math>\mathbf{M}</math>的k个最大奇异值之和称为<math>\mathbf{M}</math>~~的Ky~~ Fan k-~~范数，这是一种矩阵范数（matrix~~ norm）<ref>{{citation | last1=Fan | first1=Ky | date=1951 |url=https://pmc.ncbi.nlm.nih.gov/articles/PMC1063464/| title="Maximum properties and inequalities for the eigenvalues of completely continuous operators" | journal=Proceedings of the National Academy of Sciences of the United States of America | volume=37 | issue=11 | pages=760–766 | doi=10.1073/pnas.37.11.760 | pmc=1063464 | pmid=16578416 | bibcode=1951PNAS...37..760F}}</ref>。

+

我们把<math>\mathbf{M}</math>的k个最大奇异值之和称为<math>\mathbf{M}</math>的[[Ky Fan]] k-范数，这是一种[[矩阵范数]]（matrix norm）<ref>{{citation | last1=Fan | first1=Ky | date=1951 |url=https://pmc.ncbi.nlm.nih.gov/articles/PMC1063464/| title="Maximum properties and inequalities for the eigenvalues of completely continuous operators" | journal=Proceedings of the National Academy of Sciences of the United States of America | volume=37 | issue=11 | pages=760–766 | doi=10.1073/pnas.37.11.760 | pmc=1063464 | pmid=16578416 | bibcode=1951PNAS...37..760F}}</ref>。

−

Ky Fan范数中的第一个，即Ky Fan 1-范数，等同于<math>\mathbf{M}</math>作为线性算子相对于<math>K^m</math>和<math>K^n</math>~~的欧几里得范数的算子范数（operator~~ norm）。换言之，Ky Fan 1-范数就是标准<math>\ell^2</math>欧几里得内积诱导的算子范数。我们很容易就能验证Ky Fan 1-范数和奇异值之间的关系。一般来说，对于（可能是无限维）希尔伯特空间上的有界算子<math>\mathbf{M}</math>，我们有：

+

Ky Fan范数中的第一个，即Ky Fan 1-范数，等同于<math>\mathbf{M}</math>作为线性算子相对于<math>K^m</math>和<math>K^n</math>的欧几里得范数的[[算子范数]]（operator norm）。换言之，Ky Fan 1-范数就是标准<math>\ell^2</math>欧几里得内积诱导的算子范数。我们很容易就能验证Ky Fan 1-范数和奇异值之间的关系。一般来说，对于（可能是无限维）希尔伯特空间上的有界算子<math>\mathbf{M}</math>，我们有：

<math>\|\mathbf{M}\| = \|\mathbf{M}^*\mathbf{M}\|^{\frac{1}{2}}</math>

−

在矩阵情况下，<math>(\mathbf{M}^*\mathbf{M})^{1/2}</math>~~是个正规矩阵，所以~~<math>\|\mathbf{M}^*\mathbf{M}\|^{1/2}</math>就是<math>(\mathbf{M}^*\mathbf{M})^{1/2}</math>的最大特征值，也就是<math>\mathbf{M}</math>的最大奇异值。

+

在矩阵情况下，<math>(\mathbf{M}^*\mathbf{M})^{1/2}</math>是个[[正规矩阵]]，所以<math>\|\mathbf{M}^*\mathbf{M}\|^{1/2}</math>就是<math>(\mathbf{M}^*\mathbf{M})^{1/2}</math>的最大特征值，也就是<math>\mathbf{M}</math>的最大奇异值。

−

Ky ~~Fan范数中的最后一个，即所有奇异值的和，我们称之为迹范数（trace~~ norm）（也叫核范数），定义为<math>\|\mathbf{M}\| = \operatorname{Tr}(\mathbf{M}^*\mathbf{M})^{1/2}</math>（<math>\mathbf{M}^*\mathbf{M}</math>的特征值是奇异值的平方）。

+

Ky Fan范数中的最后一个，即所有奇异值的和，我们称之为[[迹范数]]（trace norm）（也叫核范数），定义为<math>\|\mathbf{M}\| = \operatorname{Tr}(\mathbf{M}^*\mathbf{M})^{1/2}</math>（<math>\mathbf{M}^*\mathbf{M}</math>的特征值是奇异值的平方）。

===希尔伯特-施密特范数===

−

奇异值还与算子空间上的另一个范数有关。我们来看看<math>n \times n</math>~~矩阵上的希尔伯特~~-~~施密特内积（Hilbert–Schmidt~~ inner product），它的定义是：

+

奇异值还与算子空间上的另一个范数有关。我们来看看<math>n \times n</math>矩阵上的[[希尔伯特-施密特]]内积（Hilbert–Schmidt inner product），它的定义是：

<math>\langle \mathbf{M}, \mathbf{N} \rangle = \operatorname{tr}(\mathbf{N}^*\mathbf{M}).</math>

第574行：第575行：

<math>\|\mathbf{M}\| = \sqrt{\sum_i \sigma_i^2}</math>

−

这里<math>\sigma_i</math>是<math>\mathbf{M}</math>的奇异值。我们把这个称为<math>\mathbf{M}</math>~~的Frobenius范数、Schatten~~ 2-范数或希尔伯特-施密特范数。直接计算可以发现，<math>\mathbf{M} = (m_{ij})</math>的Frobenius范数与下式一致：

+

这里<math>\sigma_i</math>是<math>\mathbf{M}</math>的奇异值。我们把这个称为<math>\mathbf{M}</math>的[[Frobenius范数]]、Schatten 2-范数或希尔伯特-施密特范数。直接计算可以发现，<math>\mathbf{M} = (m_{ij})</math>的Frobenius范数与下式一致：

第590行：第591行：

===希尔伯特空间上的有界算子===

−

我们可以将分解 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math> 推广到可分希尔伯特空间 <math>H</math> ~~上的有界算子（bounded~~ operator）<math>\mathbf{M}</math>。具体来说，对任何有界算子 <math>\mathbf{M}</math>~~，都存在部分等距（partial~~ isometry）算子 <math>\mathbf{U}</math>，酉算子 <math>\mathbf{V}</math>，测度空间 <math>(X,\mu)</math>，以及非负可测函数 <math>f</math>，使得：

+

我们可以将分解 <math>\mathbf{M} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^*</math> 推广到可分希尔伯特空间 <math>H</math> 上的[[有界算子]]（bounded operator）<math>\mathbf{M}</math>。具体来说，对任何有界算子 <math>\mathbf{M}</math>，都存在[[部分等距]]（partial isometry）算子 <math>\mathbf{U}</math>，酉算子 <math>\mathbf{V}</math>，测度空间 <math>(X,\mu)</math>，以及非负可测函数 <math>f</math>，使得：

<math>\mathbf{M} = \mathbf{U}T_f\mathbf{V}^*</math>

第596行：第597行：

这里 <math>T_f</math> 是 <math>L^2(X,\mu)</math> 上的 <math>f</math> 乘法算子。

−

我们可以通过模仿上述矩阵情况的线性代数论证来证明这一点。<math>\mathbf{V}T_f\mathbf{V}^*</math> 是 <math>\mathbf{M}^*\mathbf{M}</math> 的唯一正平方根，这由自伴算子（self-adjoint ~~operators）的Borel函数演算给出。~~<math>\mathbf{U}</math> 不必是酉的原因是，与有限维情况不同，给定具有非平凡核的等距算子 <math>U_1</math>，可能找不到合适的 <math>U_2</math> 使得：

+

我们可以通过模仿上述矩阵情况的线性代数论证来证明这一点。<math>\mathbf{V}T_f\mathbf{V}^*</math> 是 <math>\mathbf{M}^*\mathbf{M}</math> 的唯一正平方根，这由自伴算子（self-adjoint operators）的[[Borel函数演算]] (Borel functional calculus) 给出。<math>\mathbf{U}</math> 不必是酉的原因是，与有限维情况不同，给定具有非平凡核的等距算子 <math>U_1</math>，可能找不到合适的 <math>U_2</math> 使得：

<math>\begin{bmatrix}U_1\\U_2\end{bmatrix}</math>

第602行：第603行：

成为一个酉算子。

−

~~与矩阵类似，奇异值分解等价于算子的极分解：我们可以简单地写作：~~

+

与矩阵类似，奇异值分解等价于算子的[[极分解]]：我们可以简单地写作：

<math>\mathbf{M} = \mathbf{U}\mathbf{V}^* \cdot \mathbf{V}T_f\mathbf{V}^*</math>

第616行：第617行：

这里级数在 <math>H</math> 的范数拓扑中收敛。注意，这与有限维情况的表达式相似。我们称 <math>\sigma_i</math> 为 <math>\mathbf{M}</math> 的奇异值，<math>\{\mathbf{U}e_i\}</math> 和 <math>\{\mathbf{U}e_i\}</math> 分别为 <math>\mathbf{M}</math> 的左奇异和右奇异向量。

−

~~希尔伯特空间上的紧算子是有限秩算子（finite~~ rank operator）在一致算子拓扑中的闭包。上述级数表达式给出了这种表示的一个明确例子。这直接导出以下结论：

+

希尔伯特空间上的紧算子是[[有限秩算子]]（finite rank operator）在一致算子拓扑中的闭包。上述级数表达式给出了这种表示的一个明确例子。这直接导出以下结论：

定理：<math>\mathbf{M}</math> 是紧的当且仅当 <math>\mathbf{M}^*\mathbf{M}</math> 是紧的。

第622行：第623行：

==历史==

−

奇异值分解最初源于微分几何学家的研究。他们希望确定能否通过对两个作用空间进行独立正交变换，使一个实双线性形式等同于另一个。1873年和1874年，Eugenio Beltrami 和 Camille Jordan 分别独立发现，双线性形式（用矩阵表示）的奇异值构成了正交替代下的完整不变量集（complete set of invariants）。1889年，James Joseph Sylvester 也独立得出了实方阵的奇异值分解，他将奇异值称为矩阵 <math>\mathbf{A}</math> 的规范乘子。1915年，Autonne ~~成为第四位独立发现者，他通过极分解推导出了奇异值分解。直到1936年，Carl~~ Eckart 和 Gale J. Young 才首次证明了矩形和复矩阵的奇异值分解<ref>{{cite journal |last1=Eckart |first1=C. |last2=Young |first2=G. |title=The approximation of one matrix by another of lower rank |journal=Psychometrika |volume=1 |issue=3 |pages=211–8 |year=1936 |doi=10.1007/BF02288367 |s2cid=10163399 }}</ref>~~，他们将其视为厄米特矩阵主轴变换的推广。~~

+

奇异值分解最初源于[[微分几何]]学家的研究。他们希望确定能否通过对两个作用空间进行独立正交变换，使一个实[[双线性形式]]等同于另一个。1873年和1874年，Eugenio Beltrami 和 Camille Jordan 分别独立发现，双线性形式（用矩阵表示）的奇异值构成了正交替代下的完整不变量集（complete set of invariants）。1889年，James Joseph Sylvester 也独立得出了实方阵的奇异值分解，他将奇异值称为矩阵 <math>\mathbf{A}</math> 的规范乘子。1915年，Autonne 成为第四位独立发现者，他通过[[极分解]]推导出了奇异值分解。直到1936年，Carl Eckart 和 Gale J. Young 才首次证明了矩形和复矩阵的奇异值分解<ref>{{cite journal |last1=Eckart |first1=C. |last2=Young |first2=G. |title=The approximation of one matrix by another of lower rank |journal=Psychometrika |volume=1 |issue=3 |pages=211–8 |year=1936 |doi=10.1007/BF02288367 |s2cid=10163399 }}</ref>，他们将其视为[[厄米特矩阵]][[主轴]]变换的推广。

−

1907年，Erhard Schmidt ~~为积分算子（integral~~ operators）（在某些弱技术假设下为紧算子）定义了类似奇异值的概念，他似乎不知道有关有限矩阵奇异值的平行研究。1910年，Émile Picard 进一步发展了这一理论，并首次将 <math>\sigma_k</math> 称为奇异值。

+

1907年，Erhard Schmidt 为[[积分算子]]（integral operators）（在某些弱技术假设下为紧算子）定义了类似奇异值的概念，他似乎不知道有关有限矩阵奇异值的平行研究。1910年，Émile Picard 进一步发展了这一理论，并首次将 <math>\sigma_k</math> 称为奇异值。

−

计算奇异值分解的实用方法可追溯到1954-1955年 Kogbetliantz 和1958年 Hestenes 的工作<ref>{{cite journal |last1=Hestenes |first1=M. R. |title=Inversion of Matrices by Biorthogonalization and Related Results |journal=Journal of the Society for Industrial and Applied Mathematics |volume=6 |issue=1 |pages=51–90 |year=1958 |doi=10.1137/0106005 |jstor=2098862 |mr=0092215 }}</ref>。这些方法与使用平面旋转或 Givens ~~旋转的~~ Jacobi ~~特征值算法极为相似。然而，1965年~~ Gene Golub 和 William Kahan 发表的方法<ref name = "source3"/>取代了之前的方法，他们采用了 Householder ~~变换或反射。1970年，Golub~~ 和 Christian Reinsch<ref>{{cite journal |last1=Golub |first1=G. H. |last2=Reinsch |first2=C. |title=Singular value decomposition and least squares solutions |journal=Numerische Mathematik |volume=14 |issue=5 |pages=403–420 |year=1970 |doi=10.1007/BF02163027 |mr=1553974 |s2cid=123532178 }}</ref> 发表了 Golub/Kahan 算法的一个变体，这个变体至今仍广泛应用。

+

计算奇异值分解的实用方法可追溯到1954-1955年 Kogbetliantz 和1958年 Hestenes 的工作<ref>{{cite journal |last1=Hestenes |first1=M. R. |title=Inversion of Matrices by Biorthogonalization and Related Results |journal=Journal of the Society for Industrial and Applied Mathematics |volume=6 |issue=1 |pages=51–90 |year=1958 |doi=10.1137/0106005 |jstor=2098862 |mr=0092215 }}</ref>。这些方法与使用平面旋转或 [[Givens 旋转]]的 [[Jacobi 特征值算法]]极为相似。然而，1965年 Gene Golub 和 William Kahan 发表的方法<ref name = "source3"/>取代了之前的方法，他们采用了 [[Householder 变换]]或反射。1970年，Golub 和 Christian Reinsch<ref>{{cite journal |last1=Golub |first1=G. H. |last2=Reinsch |first2=C. |title=Singular value decomposition and least squares solutions |journal=Numerische Mathematik |volume=14 |issue=5 |pages=403–420 |year=1970 |doi=10.1007/BF02163027 |mr=1553974 |s2cid=123532178 }}</ref> 发表了 Golub/Kahan 算法的一个变体，这个变体至今仍广泛应用。

==参考文献==

相信未来

2,464

个编辑