更改

差分隐私 (查看源代码)

2021年7月20日 (二) 17:41的版本

删除338字节、 2021年7月20日 (二) 17:41

小

Moved page from wikipedia:en:Differential privacy (history)

第26行：第26行：

Official statistics organizations are charged with collecting information from individuals or establishments, and publishing aggregate data to serve the public interest. For example, the 1790 United States Census collected information about individuals living in the United States and published tabulations based on sex, age, race, and condition of servitude. Statistical organizations have long collected information under a promise of confidentiality that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.

−

= = 历史 = = 官方统计组织负责从个人或机构收集信息，并公布汇总数据，以服务于公众利益。例如，1790年美国人口普查收集了生活在美国的个人信息，并公布了基于性别、年龄、种族和奴役状况的表格。统计组织长期以来一直在保密的前提下收集信息，即所提供的信息将用于统计目的，但出版物不会提供可追溯到具体个人或机构的信息。为了实现这一目标，统计机构长期以来一直在其出版物中压制信息。例如，在按业务类别列出某个城镇中每家企业的销售情况的表格中，可能会取消只有一家公司信息的单元格，以便对该公司的具体销售情况保密。

+

官方统计机构负责从个人或机构收集信息，并公布汇总数据，以服务于公众利益。例如，1790年美国人口普查收集了生活在美国的个人信息，并公布了基于性别、年龄、种族和奴役状况的表格。统计组织长期以来一直在保密的前提下收集信息，即所提供的信息将用于统计目的，但出版物不会提供可追溯到具体个人或机构的信息。为了实现这一目标，统计机构长期以来一直在其出版物中压制信息。例如，在按业务类别列出某个城镇中每家企业的销售情况的表格中，可能会取消只有一家公司信息的单元格，以便对该公司的具体销售情况保密。

The adoption of electronic information processing systems by statistical agencies in the 1950s and 1960s dramatically increased the number of tables that a statistical organization could produce and, in so doing, significantly increased the potential for an improper disclosure of confidential information. For example, if a business that had its sales numbers suppressed also had those numbers appear in the total sales of a region, then it might be possible to determine the suppressed value by subtracting the other sales from that total. But there might also be combinations of additions and subtractions that might cause the private information to be revealed. The number of combinations that needed to be checked increases exponentially with the number of publications, and it is potentially unbounded if data users are able to make queries of the statistical database using an interactive query system.

第69行：第69行：

The 2006 Dwork, McSherry, Nissim and Smith article introduced the concept of ε-differential privacy, a mathematical definition for the privacy loss associated with any data release drawn from a statistical database. (Here, the term statistical database means a set of data that are collected under the pledge of confidentiality for the purpose of producing statistics that, by their production, do not compromise the privacy of those individuals who provided the data.)

−

~~= = = ε- 差别隐私 = =~~ 2006年 ~~Dwork，McSherry，Nissim~~ 和 Smith ~~的文章介绍了~~ ε- ~~差别隐私的概念，这是从统计数据库中抽取的任何数据泄露引起的隐私损失的数学定义。~~(在这里，统计数据库一词是指根据保密承诺收集的一组数据，目的是编制统计数据，而编制这些数据不会损害提供数据的个人的隐私。)

+

2006年 Dwork、 McSherry、 Nissim 和 Smith 的文章引入了 ε- 差别隐私的概念，这是一个数学定义，用来定义从统计数据库中发布的任何数据所导致的隐私损失。(在这里，统计数据库一词是指根据保密承诺收集的一组数据，目的是编制统计数据，而编制这些数据不会损害提供数据的个人的隐私。)

The intuition for the 2006 definition of ε-differential privacy is that a person's privacy cannot be compromised by a statistical release if their data are not in the database. Therefore, with differential privacy, the goal is to give each individual roughly the same privacy that would result from having their data removed. That is, the statistical functions run on the database should not overly depend on the data of any one individual.

第104行：第104行：

where the probability is taken over the randomness used by the algorithm.

−

~~= = = = ε-differential privacy 的定义 = = =~~ 设 ε ~~是一个正实数，mathcal~~ { a }~~是一个以数据集作为输入的随机化算法~~(~~表示持有数据的受信任方的行为~~)。让 textrm { im }数学{ a }表示数学{ a }的映像。算法数学{ a }被称为提供 epsilon-differentiation 保密性，如果对于所有数据集 d1和 d2在单个元素上不同(即，一个人的数据) ，以及所有 textrm { im }数学{ a }的子集 s: Pr [ mathcal { a }(d _ 1)在 s ] leq exp left (epon right) cdot Pr [数学{ a }(d _ 2)在 s ]中，其中概率取代了算法所使用的随机性。

+

设 ε 是一个正实数，而 mathcal { a }是一个以数据集作为输入(表示持有数据的受信任方的操作)的随机化算法。让 textrm { im }数学{ a }表示数学{ a }的映像。算法数学{ a }被称为提供 epsilon-differentiation 保密性，如果对于所有数据集 d1和 d2在单个元素上不同(即，一个人的数据) ，以及所有 textrm { im }数学{ a }的子集 s: Pr [ mathcal { a }(d _ 1)在 s ] leq exp left (epon right) cdot Pr [数学{ a }(d _ 2)在 s ]中，其中概率取代了算法所使用的随机性。

Differential privacy offers strong and robust guarantees that facilitate modular design and analysis of differentially private mechanisms due to its [[#Composability|composability]], [[#Robustness to post-processing|robustness to post-processing]], and graceful degradation in the presence of [[#Group privacy|correlated data]].

第117行：第117行：

(Self-)composability refers to the fact that the joint distribution of the outputs of (possibly adaptively chosen) differentially private mechanisms satisfies differential privacy.

−

~~= = = = 可组合性 = = = =~~ (Self -)~~可组合性是指差异私有机制的输出的联合分布满足差分隐私。~~

+

(Self -)可组合性是指差异私有机制的输出(可能是自适应选择的)的联合分布满足差分隐私的事实。

'''Sequential composition.''' If we query an ε-differential privacy mechanism <math>t</math> times, and the randomization of the mechanism is independent for each query, then the result would be <math>\epsilon t</math>-differentially private. In the more general case, if there are <math>n</math> independent mechanisms: <math>\mathcal{M}_1,\dots,\mathcal{M}_n</math>, whose privacy guarantees are <math>\epsilon_1,\dots,\epsilon_n</math> differential privacy, respectively, then any function <math>g</math> of them: <math>g(\mathcal{M}_1,\dots,\mathcal{M}_n)</math> is <math>\left(\sum\limits_{i=1}^{n} \epsilon_i\right)</math>-differentially private.<ref name="PINQ" />

第136行：第136行：

For any deterministic or randomized function F defined over the image of the mechanism \mathcal{A}, if \mathcal{A} satisfies ε-differential privacy, so does F(\mathcal{A}).

−

~~= = = 后处理的鲁棒性 = = = 对于任何定义在机制数学图像上的确定性随机函数 f，如果数学图像~~{ a }满足 ε- ~~微分隐私性，那么~~ f (~~数学图像~~{ a })也满足 ε 微分隐私性。

+

对于在机制数学{ a }映象上定义的任意确定或随机函数 f，如果数学{ a }满足 ε- 微分隐私性，则 f (数学{ a })也满足 ε- 微分隐私性。

Together, [[#Composability|composability]] and [[#Robustness to post-processing|robustness to post-processing]] permit modular construction and analysis of differentially private mechanisms and motivate the concept of the ''privacy loss budget''. If all elements that access sensitive data of a complex mechanisms are separately differentially private, so will be their combination, followed by arbitrary post-processing.

第149行：第149行：

In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if one particular participant submitted his information. However this is also extendable if we want to protect databases differing in c rows, which amounts to adversary with arbitrary auxiliary information can know if c particular participants submitted their information. This can be achieved because if c items change, the probability dilation is bounded by \exp ( \epsilon c ) instead of \exp ( \epsilon ), i.e., for D1 and D2 differing on c items:

−

~~= = = 组隐私 = = =~~ 一般来说，ε- 差别隐私被设计用来保护只在一行中不同的相邻数据库之间的隐私。这意味着任何拥有任意辅助信息的对手都不能知道一个特定的参与者是否提交了他的信息。然而，如果我们想要保护不同于 c 行的数据库，这也是可扩展的，这相当于拥有任意辅助信息的对手可以知道 c 特定的参与者是否提交了他们的信息。这是可以实现的，因为如果 c 项发生变化，概率膨胀的界限是 exp (epsilon c) ，而不是 exp (epsilon) ，也就是说，对于不同于 c 项的 D1和 D2来说:

+

一般来说，ε- 差分隐私保护的目的是保护相邻数据库之间的隐私，只有一行不同。这意味着任何拥有任意辅助信息的对手都不能知道一个特定的参与者是否提交了他的信息。然而，如果我们想要保护不同于 c 行的数据库，这也是可扩展的，这相当于拥有任意辅助信息的对手可以知道 c 特定的参与者是否提交了他们的信息。这是可以实现的，因为如果 c 项发生变化，概率膨胀的界限是 exp (epsilon c) ，而不是 exp (epsilon) ，也就是说，对于不同于 c 项的 D1和 D2来说:

:<math>\Pr[\mathcal{A}(D_{1})\in S]\leq

第170行：第170行：

Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the exponential mechanismF.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007. and posterior samplingChristos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014 sample from a problem-dependent family of distributions instead.

−

= = = ε- 差异私有机制 = = 既然差分隐私是一个概率概念，任何差异私有机制都必然是随机的。其中一些，如下面描述的拉普拉斯机制，依赖于在我们要计算的函数中添加受控噪声。其他的，比如指数机制，麦雪莉和塔瓦尔。设计图片来源: 差分隐私。2007年第48届计算机科学基础年会论文集。迪米特拉卡基斯(Dimitrakakis)、布莱恩 · 尼尔森(Blaine Nelson)、艾卡捷里尼 · 米特罗科萨(aikaterina Mitrokotsa)、本杰明 · 鲁宾斯坦(Benjamin Rubinstein)。强大的私人贝叶斯推断。算法学习理论2014年的样本取自一个依赖于问题的分布族。

+

因为差分隐私是一个概率概念，所以任何差异私有机制都必然是随机的。其中一些，如下面描述的拉普拉斯机制，依赖于在我们要计算的函数中添加受控噪声。其他的，比如指数机制，麦雪莉和塔瓦尔。设计图片来源: 差分隐私。2007年第48届计算机科学基础年会论文集。迪米特拉卡基斯(Dimitrakakis)、布莱恩 · 尼尔森(Blaine Nelson)、艾卡捷里尼 · 米特罗科萨(aikaterina Mitrokotsa)、本杰明 · 鲁宾斯坦(Benjamin Rubinstein)。强大的私人贝叶斯推断。算法学习理论2014年的样本取自一个依赖于问题的分布族。

=== Sensitivity ===

第181行：第181行：

where the maximum is over all pairs of datasets D_1 and D_2 in \mathcal{D} differing in at most one element and \lVert \cdot \rVert_1 denotes the \ell_1 norm.

−

~~= = = 灵敏度 = =~~ 设 d ~~为正整数，数学~~{ d }~~为数据集合，f~~ 冒号{ d }右行数学{ r } ^ d ~~为函数。函数的灵敏度定义为~~: △ f = max lVert f (d _ 1)-f (d _ 2) rVert _ 1，其中最大值是所有数据集对 d1和 d2在数学上的数值，最多只有一个元素不同，而 lVert 点 rVert _ 1表示 lVert _ 1范数。

+

设 d 是正整数，数学{ d }是数据集合，f 冒号{ d }右行数学{ r } ^ d 是函数。函数的灵敏度定义为: △ f = max lVert f (d _ 1)-f (d _ 2) rVert _ 1，其中最大值是所有数据集对 d1和 d2在数学上的数值，最多只有一个元素不同，而 lVert 点 rVert _ 1表示 lVert _ 1范数。

In the example of the medical database below, if we consider <math>f</math> to be the function <math>Q_i</math>, then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one.

第204行：第204行：

−

~~= = = 拉普拉斯机制 = = = 拉普拉斯机制增加了拉普拉斯噪音~~(~~即:。拉普拉斯分布的噪声，可以用概率密度函数文本~~{ noise }(y) propto exp (- | y |/lambda)来表示，！这个函数的平均值是0和标准差的 sqrt {2} lambda，!).现在，在我们的例子中，我们定义了数学{ a }的输出函数，！作为一个实值函数(由 mathcal { a } ，!)数学{ t } _ {数学{ a }}(x) = f (x) + y，！其中 y sim 文本{ Lap }(lambda) ，! ，！还有 f！是我们计划在数据库上执行的原始实值查询/函数。现在清楚的数学{ t } _ {数学{ a }(x) ，！可以被认为是一个连续的随机变量

+

拉普拉斯机制增加了拉普拉斯的噪音(即。拉普拉斯分布的噪声，可以用概率密度函数文本{ noise }(y) propto exp (- | y |/lambda)来表示，！这个函数的平均值是0和标准差的 sqrt {2} lambda，!).现在，在我们的例子中，我们定义了数学{ a }的输出函数，！作为一个实值函数(由 mathcal { a } ，!)数学{ t } _ {数学{ a }}(x) = f (x) + y，！其中 y sim 文本{ Lap }(lambda) ，! ，！还有 f！是我们计划在数据库上执行的原始实值查询/函数。现在清楚的数学{ t } _ {数学{ a }(x) ，！可以被认为是一个连续的随机变量

:<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math>

第280行：第280行：

{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"

|-

−

!姓名! ！患有糖尿病(x) |-| Ross | | 1 |-| Monica | | | 1 | |-| Joey | | 0 |-| Phoebe | 0 |-|-| Chandler | | 1 |-|-| Rachel | | | | |

+

!姓名! ！患有糖尿病(x) |-| Ross | | 1 |-| Monica | | | 1 | |-| Joey | | 0 |-| Phoebe | 0 |-|-| Chandler | | 1 |-|-| Rachel | | | | |

Now suppose a malicious user (often termed an ''adversary'') wants to find whether Chandler has diabetes or not. Suppose he also knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query <math>Q_i</math> that returns the partial sum of the first <math>i</math> rows of column <math>X</math> in the database. In order to find Chandler's diabetes status the adversary executes <math>Q_5(D_1)</math> and <math>Q_4(D_1)</math>, then computes their difference. In this example, <math>Q_5(D_1) = 3</math> and <math>Q_4(D_1) = 2</math>, so their difference is 1. This indicates that the "Has Diabetes" field in Chandler's row must be 1. This example highlights how individual information can be compromised even without explicitly querying for the information of a specific individual.

第344行：第344行：

A transformation T is c-stable if the Hamming distance between T(A) and T(B) is at most c-times the Hamming distance between A and B for any two databases A,B. Theorem 2 in asserts that if there is a mechanism M that is \epsilon-differentially private, then the composite mechanism M\circ T is (\epsilon \times c)-differentially private.

−

~~= = = = 稳定变换 = = = a 变换 t 是 c- 稳定的，如果~~ t (a)和 t (b)~~之间的汉明距离最多是任意两个数据库 a，b~~ 和 ~~a 之间的汉明距离~~ c~~-乘以 b。定理2断言，如果存在一个机制~~ m 是 epsilon-微分私有的，那么复合机制 m circ t 是(epsilon 乘以 c)-微分私有的。

+

对于任意两个数据库 a，b，如果 t (a)和 t (b)之间的汉明距离最多是 a 和 b 之间的汉明距离的 c 倍，则变换 t 是 c 稳定的。定理2断言，如果存在一个机制 m 是 epsilon-微分私有的，那么复合机制 m circ t 是(epsilon 乘以 c)-微分私有的。

This could be generalized to group privacy, as the group size could be thought of as the Hamming distance <math>h</math> between

第359行：第359行：

Since differential privacy is considered to be too strong or weak for some applications, many versions of it have been proposed. The most widespread relaxation is (ε, δ)-differential privacy, which weakens the definition by allowing an additional small δ density of probability on which the upper bound ε does not hold.

−

~~= = 差分隐私的其他概念 = = 由于差分隐私被认为对于某些应用来说太强或太弱，因此提出了许多版本。最广泛的松弛是~~(ε，δ)-差分隐私，它通过允许增加一个上限 ε 不成立的概率密度 δ 来削弱定义。

+

由于对于某些应用程序来说，差分隐私被认为太强或太弱，因此人们提出了许多版本。最广泛的松弛是(ε，δ)-差分隐私，它通过允许增加一个上限 ε 不成立的概率密度 δ 来削弱定义。

== Adoption of differential privacy in real-world applications ==

第382行：第382行：

* 2020: LinkedIn, for advertiser queries.

−

~~= = = 在实际应用中采用差分隐私 = = = 迄今为止已经知道了差分隐私的几种实际用途~~:

+

2008: u.s. Census Bureau，for shows comforting patterns. 在实践中，差分隐私的几个用途已经为人所知:

−

* 2008: ~~美国人口普查局，用于显示通勤模式。~~

+

* 2008: 美国人口普查局，显示通勤模式。

* 2014年: 谷歌的 RAPPOR，用于遥测，例如了解不受欢迎的软件劫持用户设置的统计数据。2015: Google，分享历史流量统计数据。

* 2016年: 苹果公司宣布打算在 iOS 10中使用差分隐私智能个人助理来改进其智能个人助理技术。

第393行：第393行：

There are several public purpose considerations regarding differential privacy that are important to consider, especially for policymakers and policy-focused audiences interested in the social opportunities and risks of the technology:

−

~~= = 公共目的考虑 = = 关于差分隐私技术，有几个公共目的考虑因素需要考虑，特别是对于对技术的社会机会和风险感兴趣的决策者和政策重点受众来说~~:

+

关于差分隐私技术，有几个公共目的方面的考虑是需要考虑的，特别是对于那些对技术的社会机遇和风险感兴趣的决策者和政策关注的受众:

* '''Data Utility & Accuracy.''' The main concern with differential privacy is the tradeoff between data utility and individual privacy. If the privacy loss parameter is set to favor utility, the privacy benefits are lowered (less “noise” is injected into the system); if the privacy loss parameter is set to favor heavy privacy, the accuracy and utility of the dataset are lowered (more “noise” is injected into the system). It is important for policymakers to consider the tradeoffs posed by differential privacy in order to help set appropriate best practices and standards around the use of this privacy preserving practice, especially considering the diversity in organizational use cases. It is worth noting, though, that decreased accuracy and utility is a common issue among all statistical disclosure limitation methods and is not unique to differential privacy. What is unique, however, is how policymakers, researchers, and implementers can consider mitigating against the risks presented through this tradeoff.

第422行：第422行：

*Protected health information

−

~~= = = = =~~

+

* 准标识符

−

* 指数机制(差分隐私)-~~一种设计差异私有算法的技术~~

+

* 指数机制(差分隐私)-一种设计不同私有算法的技术

* k-匿名

−

* ~~差异私有图分析~~

+

* 图的不同私有分析

* 受保护的健康信息

第438行：第438行：

Dwork, Cynthia, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. "Our data, ourselves: Privacy via distributed noise generation." In Advances in Cryptology-EUROCRYPT 2006, pp. 486–503. Springer Berlin Heidelberg, 2006.

−

~~==References==~~

{{Reflist|refs=

第568行：第567行：

* Ryffel, Theo, Andrew Trask, et. al. "A generic framework for privacy preserving deep learning"

−

~~= = = 进一步阅读 = = =~~

+

差分隐私上的阅读清单。2017.“当所有数据都是私人数据时，统计机构将如何运作?”。隐私与保密期刊7(3)。(幻灯片)

−

* 差分隐私上的阅读清单

−

* 。2017.“当所有数据都是私人数据时，统计机构将如何运作?”。隐私与保密期刊7(3)。(幻灯片)

* “差分隐私: 非技术观众入门”，Kobbi Nissim，Thomas Steinke，Alexandra Wood，Micah Altman，Aaron Bembenek，Mark Bun，Marco gabordi，David r. o’brien，and Salil Vadhan，Harvard Privacy Tools Project，February 14,2018

* Dinur，Irit and Kobbi Nissim。2003.在保护隐私的同时披露信息。在第二十二届 ACM SIGMOD-SIGACT-SIGART 数据库系统原理研讨会会议录(PODS’03)。ACM，纽约，纽约，美国，202-210. 。

第607行：第604行：

*Technology Factsheet: Differential Privacy by Raina Gandhi and Amritha Jayanti, Belfer Center for Science and International Affairs, Fall 2020

−

~~= = = 外部链接 = = =~~

+

差分隐私: Cynthia Dwork，ICALP July 2006。差分隐私的算法基础》，Cynthia Dwork 和 Aaron Roth，2014年。2013年12月，加州理工学院卡特里娜 · 利格特教授，差分隐私，差分隐私，差分隐私实用指南，克里斯汀 · 特拉克，普渡大学,2012年4月

−

* 差分隐私: Cynthia Dwork，ICALP July 2006。差分隐私的算法基础》，Cynthia Dwork 和 Aaron Roth，2014年。2013年12月，加州理工学院卡特里娜 · 利格特教授，差分隐私，差分隐私，差分隐私实用指南，克里斯汀 · 特拉克，普渡大学,2012年4月

* 私人地图制作者 v0.2 on the Common Data Project Blog

* Learning Statistics with Privacy，added by the Flip of a Coin by úlfar Erlingsson，Google Research Blog，October 2014

Moonscar

管理员

1,592

个编辑

更改

差分隐私 (查看源代码)

2021年7月20日 (二) 17:41的版本

导航菜单

搜索