第91行: |
第91行: |
| '''Sequential composition.''' If we query an ε-differential privacy mechanism <math>t</math> times, and the randomization of the mechanism is independent for each query, then the result would be <math>\epsilon t</math>-differentially private. In the more general case, if there are <math>n</math> independent mechanisms: <math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>, whose privacy guarantees are <math>\epsilon_1,\dots,\epsilon_n</math> differential privacy, respectively, then any function <math>g</math> of them: <math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math> is <math>\left(\sum\limits_{i=1}^{n} \epsilon_i\right)</math>-differentially private.<ref name="PINQ" /> | | '''Sequential composition.''' If we query an ε-differential privacy mechanism <math>t</math> times, and the randomization of the mechanism is independent for each query, then the result would be <math>\epsilon t</math>-differentially private. In the more general case, if there are <math>n</math> independent mechanisms: <math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>, whose privacy guarantees are <math>\epsilon_1,\dots,\epsilon_n</math> differential privacy, respectively, then any function <math>g</math> of them: <math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math> is <math>\left(\sum\limits_{i=1}^{n} \epsilon_i\right)</math>-differentially private.<ref name="PINQ" /> |
| | | |
− | '''连续构图。'''如果我们对一个ε-差分隐私机制进行<math>t</math>次查询,并且该机制的随机化独立于每个查询,那么结果将是<math>\epsilon t</math>- 差分隐私的。在更普遍的情况下,如果存在<math>n</math>个独立的机制:<math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>——其隐私保证分别是<math>\epsilon_1,\dots,\epsilon_n</math>差分隐私,那么它们的任何函数<math>g</math>:<math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math>是<math>\left(\sum\limits_{i=1}^{n} \epsilon_i\right)</math>-差分隐私的。 | + | '''顺序合成。'''如果我们对一个ε-差分隐私机制进行<math>t</math>次查询,并且该机制的随机化独立于每个查询,那么结果将是<math>\epsilon t</math>- 差分隐私的。在更普遍的情况下,如果存在<math>n</math>个独立的机制:<math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>——其隐私保证分别是<math>\epsilon_1,\dots,\epsilon_n</math>差分隐私,那么它们的任何函数<math>g</math>:<math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math>是<math>\left(\sum\limits_{i=1}^{n} \epsilon_i\right)</math>-差分隐私的。 |
| | | |
| '''Parallel composition.''' If the previous mechanisms are computed on ''disjoint'' subsets of the private database then the function <math>g</math> would be <math>(\max_i \epsilon_i)</math>-differentially private instead.<ref name="PINQ" /> | | '''Parallel composition.''' If the previous mechanisms are computed on ''disjoint'' subsets of the private database then the function <math>g</math> would be <math>(\max_i \epsilon_i)</math>-differentially private instead.<ref name="PINQ" /> |
| | | |
− | '''平行构图。'''如果前面的机制是在私有数据库的''不相交''子集上计算的,那么函数<math>g</math>将是<math>(\max_i \epsilon_i)</math>-差分隐私的。<ref name="PINQ" /> | + | '''并行合成。'''如果前面的机制是在私有数据库的''不相交''子集上计算的,那么函数<math>g</math>将是<math>(\max_i \epsilon_i)</math>-差分隐私的。<ref name="PINQ" /> |
| | | |
| ===Robustness to post-processing === | | ===Robustness to post-processing === |
第104行: |
第104行: |
| Together, [[#Composability|composability]] and [[#Robustness to post-processing|robustness to post-processing]] permit modular construction and analysis of differentially private mechanisms and motivate the concept of the ''privacy loss budget''. If all elements that access sensitive data of a complex mechanisms are separately differentially private, so will be their combination, followed by arbitrary post-processing. | | Together, [[#Composability|composability]] and [[#Robustness to post-processing|robustness to post-processing]] permit modular construction and analysis of differentially private mechanisms and motivate the concept of the ''privacy loss budget''. If all elements that access sensitive data of a complex mechanisms are separately differentially private, so will be their combination, followed by arbitrary post-processing. |
| | | |
− | 总之,可组合性和对后期处理的鲁棒性允许模块化构建和分析不同的隐私机制,并激励隐私损失预算的概念。如果访问复杂机制的敏感数据的所有部分都是独立差分隐私的,那么它们的组合也是如此,然后是任意的后处理。
| + | 可组合性和后处理的鲁棒性保证了差分隐私机制的模块化构建和分析,并促成了隐私损失预算的概念。如果访问复杂机制的敏感数据的所有部分都是分别差分隐私的,那么不论接下来是任何后处理,它们的组合依然是差分隐私的。 |
| | | |
| ===Group privacy=== | | ===Group privacy=== |
| In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if '''one''' particular participant submitted his information. However this is also extendable if we want to protect databases differing in <math>c</math> rows, which amounts to adversary with arbitrary auxiliary information can know if '''<math>c</math>''' particular participants submitted their information. This can be achieved because if <math>c</math> items change, the probability dilation is bounded by <math>\exp ( \epsilon c )</math> instead of <math>\exp ( \epsilon )</math>,'''<ref name="Dwork, ICALP 2006" />''' i.e., for D<sub>1</sub> and D<sub>2</sub> differing on <math>c</math> items: | | In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if '''one''' particular participant submitted his information. However this is also extendable if we want to protect databases differing in <math>c</math> rows, which amounts to adversary with arbitrary auxiliary information can know if '''<math>c</math>''' particular participants submitted their information. This can be achieved because if <math>c</math> items change, the probability dilation is bounded by <math>\exp ( \epsilon c )</math> instead of <math>\exp ( \epsilon )</math>,'''<ref name="Dwork, ICALP 2006" />''' i.e., for D<sub>1</sub> and D<sub>2</sub> differing on <math>c</math> items: |
| | | |
− | In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if one particular participant submitted his information. However this is also extendable if we want to protect databases differing in c rows, which amounts to adversary with arbitrary auxiliary information can know if c particular participants submitted their information. This can be achieved because if c items change, the probability dilation is bounded by \exp ( \epsilon c ) instead of \exp ( \epsilon ), i.e., for D1 and D2 differing on c items:
| + | 一般来说,设计ε- 差分隐私的目的是保护只有一行不同的相邻数据库之间的隐私。这意味着任何拥有任意辅助信息的对手都不能知道一个特定的参与者是否提交了他的信息。然而,如果我们想要保护不同于 <math>c</math>行的数据库,这也是可扩展的,这相当于拥有任意辅助信息的对手可以知道 '''<math>c</math>'''特定的参与者是否提交了他们的信息。这是可以实现的,因为如果 <math>c</math>项发生变化,概率膨胀的界限是<math>\exp ( \epsilon c )</math>,而不是<math>\exp ( \epsilon )</math>,'''<ref name="Dwork, ICALP 2006" />''',例如对于在<math>c</math>项上不同的 D1和 D2来说: |
− | | |
− | 一般来说,ε- 差分隐私保护的目的是保护相邻数据库之间的隐私,只有一行不同。这意味着任何拥有任意辅助信息的对手都不能知道一个特定的参与者是否提交了他的信息。然而,如果我们想要保护不同于 c 行的数据库,这也是可扩展的,这相当于拥有任意辅助信息的对手可以知道 c 特定的参与者是否提交了他们的信息。这是可以实现的,因为如果 c 项发生变化,概率膨胀的界限是 exp (epsilon c) ,而不是 exp (epsilon) ,也就是说,对于不同于 c 项的 D1和 D2来说:
| |
| | | |
| :<math>\Pr[\mathcal{A}(D_{1})\in S]\leq | | :<math>\Pr[\mathcal{A}(D_{1})\in S]\leq |
第118行: |
第116行: |
| Thus setting ε instead to <math>\epsilon/c</math> achieves the desired result (protection of <math>c</math> items). In other words, instead of having each item ε-differentially private protected, now every group of <math>c</math> items is ε-differentially private protected (and each item is <math>(\epsilon/c)</math>-differentially private protected). | | Thus setting ε instead to <math>\epsilon/c</math> achieves the desired result (protection of <math>c</math> items). In other words, instead of having each item ε-differentially private protected, now every group of <math>c</math> items is ε-differentially private protected (and each item is <math>(\epsilon/c)</math>-differentially private protected). |
| | | |
− | Thus setting ε instead to \epsilon/c achieves the desired result (protection of c items). In other words, instead of having each item ε-differentially private protected, now every group of c items is ε-differentially private protected (and each item is (\epsilon/c)-differentially private protected).
| + | 因此,将 ε 设置为<math>\epsilon/c</math>可以达到预期的结果(<math>c</math>项的保护)。换句话说,并非每一项都实现ε- 差分隐私保护,而是每组的<math>c</math>项都被ε- 差分隐私保护(并且每一项都被<math>(\epsilon/c)</math>-差分隐私保护)。 |
− | | |
− | 因此,将 ε 设置为 epsilon/c 可以达到预期的结果(c 项的保护)。换句话说,取代了每个条目 ε- 差别私有保护,现在每组 c 条目都是 ε- 差别私有保护(每个条目 ε/c)-差别私有保护)。
| |
| | | |
| ==ε-differentially private mechanisms== | | ==ε-differentially private mechanisms== |
− | Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the [[Exponential mechanism (differential privacy)|exponential mechanism]]<ref>[http://research.microsoft.com/pubs/65075/mdviadp.pdf F.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007.]</ref> and posterior sampling<ref>[https://arxiv.org/abs/1306.1066 Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014]</ref> sample from a problem-dependent family of distributions instead. | + | Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the [[Exponential mechanism (differential privacy)|exponential mechanism]]<ref name=":8">[http://research.microsoft.com/pubs/65075/mdviadp.pdf F.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007.]</ref> and posterior sampling<ref name=":9">[https://arxiv.org/abs/1306.1066 Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014]</ref> sample from a problem-dependent family of distributions instead. |
− | | |
− | Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the exponential mechanismF.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007. and posterior samplingChristos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014 sample from a problem-dependent family of distributions instead.
| |
| | | |
− | 因为差分隐私是一个概率概念,所以任何差异私有机制都必然是随机的。其中一些,如下面描述的拉普拉斯机制,依赖于在我们要计算的函数中添加受控噪声。其他的,比如指数机制,麦雪莉和塔瓦尔。设计图片来源: 差分隐私。2007年第48届计算机科学基础年会论文集。迪米特拉卡基斯(Dimitrakakis)、布莱恩 · 尼尔森(Blaine Nelson)、艾卡捷里尼 · 米特罗科萨(aikaterina Mitrokotsa)、本杰明 · 鲁宾斯坦(Benjamin Rubinstein)。强大的私人贝叶斯推断。算法学习理论2014年的样本取自一个依赖于问题的分布族。
| + | 因为差分隐私是一个概率概念,所以任何差分隐私机制都必然是随机的。其中一些,如接下来描述的拉普拉斯机制,依赖于在我们要计算的函数中添加可控噪声。其他的机制,如指数机制<ref name=":8" />和后验抽样<ref name=":9" />,则是从一个与问题相关的分布系列中取样。 |
| | | |
| ===Sensitivity=== | | ===Sensitivity=== |
第134行: |
第128行: |
| where the maximum is over all pairs of datasets <math>D_1</math> and <math>D_2</math> in <math>\mathcal{D}</math> differing in at most one element and <math>\lVert \cdot \rVert_1</math> denotes the [[Taxicab geometry|<math>\ell_1</math> norm]]. | | where the maximum is over all pairs of datasets <math>D_1</math> and <math>D_2</math> in <math>\mathcal{D}</math> differing in at most one element and <math>\lVert \cdot \rVert_1</math> denotes the [[Taxicab geometry|<math>\ell_1</math> norm]]. |
| | | |
− | 设 <math>d</math>是正整数,<math>\mathcal{D}</math>是数据集合,<math>f \colon \mathcal{D} \rightarrow \mathbb{R}^d</math>是函数。函数的灵敏度<ref name="DMNS06" />,记为<math>\Delta f</math>,定义为: | + | 设 <math>d</math>为一正整数,<math>\mathcal{D}</math>为一数据集,<math>f \colon \mathcal{D} \rightarrow \mathbb{R}^d</math>是函数。函数的灵敏度<ref name="DMNS06" />记为<math>\Delta f</math>,定义为: |
| | | |
| <math>\Delta f=\max \lVert f(D_1)-f(D_2) \rVert_1,</math> | | <math>\Delta f=\max \lVert f(D_1)-f(D_2) \rVert_1,</math> |
| | | |
− | 其中最大值是所有数据集对 d1和 d2在数学上的数值,最多只有一个元素不同,而 lVert 点 rVert _ 1表示 lVert _ 1范数。
| + | 其中最大值是<math>\mathcal{D}</math>中所有最多只有一个元素不同的一对数据集<math>D_1</math>和<math>D_2</math>,而 <math>\lVert \cdot \rVert_1</math>定义了[[Taxicab geometry|<math>\ell_1</math>]]范数。 |
| | | |
| In the example of the medical database below, if we consider <math>f</math> to be the function <math>Q_i</math>, then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one. | | In the example of the medical database below, if we consider <math>f</math> to be the function <math>Q_i</math>, then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one. |
| | | |
− | In the example of the medical database below, if we consider f to be the function Q_i, then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one.
| + | 在下面的医学数据库的例子中,如果我们认为<math>f</math>是函数<math>Q_i</math>,那么函数的灵敏度是1,因为改变数据库中的任何一个条目都会导致函数的输出变为0或1。 |
− | | |
− | 在下面的医学数据库的例子中,如果我们认为 f 是函数 q _ i,那么函数的灵敏度是1,因为改变数据库中的任何一个条目都会导致函数的输出变为0或1。 | |
| | | |
| There are techniques (which are described below) using which we can create a differentially private algorithm for functions with low sensitivity. | | There are techniques (which are described below) using which we can create a differentially private algorithm for functions with low sensitivity. |
| | | |
− | There are techniques (which are described below) using which we can create a differentially private algorithm for functions with low sensitivity.
| + | 我们也可以通过一些技术(下面将描述) 使用低灵敏度函数设计差分隐私算法。 |
− | | |
− | 有一些技术(下面将描述) ,我们可以使用这些技术为低灵敏度函数创建一个差分私有算法。
| |
| | | |
| ===The Laplace mechanism=== | | ===The Laplace mechanism=== |
第158行: |
第148行: |
| <math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math> | | <math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math> |
| | | |
− | | + | 拉普拉斯机制增加了拉普拉斯的噪音(例如满足拉普拉斯分布的噪声,其可以用概率密度函数<math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>来表示,这个函数的平均值是0,标准差为<math>\sqrt{2} \lambda\,\!</math>)。在我们的例子中,我们定义了输出函数<math>\mathcal{A}\,\!</math>作为一个实值函数(由 <math>\mathcal{A}\,\!</math>调用为文本输出)<math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math>,其中<math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math>,且<math>f\,\!</math>是我们计划在数据库上执行的原始实值的查询(或函数)。现在可以确定,<math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math>可以被认为是一个连续的随机变量,当 |
− | 拉普拉斯机制增加了拉普拉斯的噪音(如满足拉普拉斯分布的噪声,其可以用概率密度函数<math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>来表示,这个函数的平均值是0,标准差为<math>\sqrt{2} \lambda\,\!</math>)。在我们的例子中,我们定义了输出函数<math>\mathcal{A}\,\!</math>作为一个实值函数(由 <math>\mathcal{A}\,\!</math>调用为文本输出)<math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math>,其中<math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math>,且<math>f\,\!</math>是我们计划在数据库上执行的原始实值的查询(或函数)。现在可以确定,<math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math>可以被认为是一个连续的随机变量,当
| |
| | | |
| :<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math><br /> | | :<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math><br /> |