更改

差分隐私 (查看源代码)

2021年11月11日 (四) 00:58的版本

删除3,681字节、 2021年11月11日 (四) 00:58

第134行：第134行：

where the maximum is over all pairs of datasets <math>D_1</math> and <math>D_2</math> in <math>\mathcal{D}</math> differing in at most one element and <math>\lVert \cdot \rVert_1</math> denotes the [[Taxicab geometry|<math>\ell_1</math> norm]].

−

~~Let~~ d ~~be a positive integer,~~ \mathcal{D} ~~be a collection of datasets, and~~ f \colon \mathcal{D} \rightarrow \mathbb{R}^d ~~be a function. The sensitivity of a function, denoted~~ \Delta f~~, is defined by~~

+

设 <math>d</math>是正整数，<math>\mathcal{D}</math>是数据集合，<math>f \colon \mathcal{D} \rightarrow \mathbb{R}^d</math>是函数。函数的灵敏度<ref name="DMNS06" />，记为<math>\Delta f</math>，定义为：

−

~~: \Delta f=\max \lVert f(D_1)-f(D_2) \rVert_1,~~

−

~~where the maximum is over all pairs of datasets D_1 and D_2 in \mathcal{D} differing in at most one element and \lVert \cdot \rVert_1 denotes the \ell_1 norm.~~

−

~~设 d 是正整数，数学{ d }是数据集合，f 冒号{ d }右行数学{ r } ^ d 是函数。函数的灵敏度定义为: △~~ f = max lVert f (~~d _ 1~~)-f (~~d _ 2~~) ~~rVert _ 1，其中最大值是所有数据集对~~ d1和 d2在数学上的数值，最多只有一个元素不同，而 lVert 点 rVert _ 1表示 lVert _ 1范数。

+

<math>\Delta f=\max \lVert f(D_1)-f(D_2) \rVert_1,</math>

+

其中最大值是所有数据集对 d1和 d2在数学上的数值，最多只有一个元素不同，而 lVert 点 rVert _ 1表示 lVert _ 1范数。

In the example of the medical database below, if we consider <math>f</math> to be the function <math>Q_i</math>, then the sensitivity of the function is one, since changing any one of the entries in the database causes the output of the function to change by either zero or one.

第196行：第196行：

||0

|}

−

~~{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"~~

−

|-

−

~~!Name!!Has Diabetes (X)~~

−

|-

−

~~|Ross~~

−

~~|| 1~~

−

|-

−

~~|Monica~~

−

~~||1~~

−

|-

−

~~|Joey~~

−

~~||0~~

−

|-

−

~~|Phoebe~~

−

~~||0~~

−

|-

−

~~| Chandler~~

−

~~||1~~

−

|-

−

~~| Rachel~~

−

~~||0~~

−

|}

−

~~{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"~~

−

|-

−

~~! 姓名! ！患有糖尿病(x) |<nowiki>-| Ross | | 1 |-| Monica | | | 1 | |-| Joey | | 0 |-| Phoebe | 0 |-|-| Chandler | | 1 |-|-| Rachel | | | | |</nowiki>~~

−

Now suppose a malicious user (often termed an ''adversary'') wants to find whether Chandler has diabetes or not. Suppose he also knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query <math>Q_i</math> that returns the partial sum of the first <math>i</math> rows of column <math>X</math> in the database. In order to find Chandler's diabetes status the adversary executes <math>Q_5(D_1)</math> and <math>Q_4(D_1)</math>, then computes their difference. In this example, <math>Q_5(D_1) = 3</math> and <math>Q_4(D_1) = 2</math>, so their difference is 1. This indicates that the "Has Diabetes" field in Chandler's row must be 1. This example highlights how individual information can be compromised even without explicitly querying for the information of a specific individual.

−

Now suppose a malicious user (often termed an adversary) wants to find whether Chandler has diabetes or not. Suppose he also knows in which row of the database Chandler resides. Now suppose the adversary is only allowed to use a particular form of query Q_i that returns the partial sum of the first i rows of column X in the database. In order to find Chandler's diabetes status the adversary executes Q_5(D_1) and Q_4(D_1), then computes their difference. In this example, Q_5(D_1) = 3 and Q_4(D_1) = 2, so their difference is 1. This indicates that the "Has Diabetes" field in Chandler's row must be 1. This example highlights how individual information can be compromised even without explicitly querying for the information of a specific individual.

现在假设一个恶意用户(通常称为对手)想要查看 Chandler 是否患有糖尿病。假设他也知道 Chandler 在数据库的哪一行中。现在假设对手只允许使用特定形式的查询 q _ i，该查询返回数据库中列 x 的第一个 i 行的部分和。为了查找钱德勒的糖尿病状态，对手执行 q5(d1)和 q4(d1) ，然后计算它们的差异。在这个例子中，q _ 5(d _ 1) = 3和 q _ 4(d _ 1) = 2，所以它们的差值是1。这表示 Chandler 行中的“ Has Diabetes”字段必须为1。这个示例突出说明了即使不显式查询特定个人的信息，个人信息也可能被泄露。

第232行：第202行：

Continuing this example, if we construct <math>D_2</math> by replacing (Chandler, 1) with (Chandler, 0) then this malicious adversary will be able to distinguish <math>D_2</math> from <math>D_1</math> by computing <math>Q_5 - Q_4</math> for each dataset. If the adversary were required to receive the values <math>Q_i</math> via an <math>\epsilon</math>-differentially private algorithm, for a sufficiently small <math>\epsilon</math>, then he or she would be unable to distinguish between the two datasets.

−

~~Continuing this example, if we construct D_2 by replacing~~ (Chandler, 1) ~~with~~ (Chandler, 0) ~~then this malicious adversary will be able to distinguish~~ D_2 ~~from D_1 by computing~~ Q_5 - Q_4 ~~for each dataset. If the adversary were required to receive the values Q_i via an~~ \epsilon~~-differentially private algorithm, for a sufficiently small~~ \epsilon~~, then he or she would be unable to distinguish between the two datasets.~~

+

继续讨论这个例子，如果我们用(Chandler, 0)替换(Chandler, 1)来构造<math>D_2</math>，那么这个恶意的对手将能够通过计算每个数据集的<math>Q_5 - Q_4</math>来区分<math>D_2</math>和<math>D_1</math>。如果对手被要求通过一个 <math>\epsilon</math> 差分隐私算法对于一个足够小的<math>\epsilon</math>接收<math>Q_i</math>值，那么他或她将无法区分两个数据集。

−

继续这个例子，如果我们用(Chandler，1)替换(Chandler，0)来构造 d2，那么这个恶意的对手将能够通过计算每个数据集的 q _ 5-q _ 4来区分 d _ 2和 d _ 1。如果对手被要求通过一个 epsilon-differentially private 算法接收值 q _ i，对于一个足够小的 epsilon，那么他或她将无法区分两个数据集。

===Randomized response===

−

A simple example, especially developed in the [[social science]]s,<ref>{{cite journal |last=Warner |first=S. L. |date=March 1965 |title=Randomised response: a survey technique for eliminating evasive answer bias |jstor=2283137 |journal=[[Journal of the American Statistical Association]] |publisher=[[Taylor & Francis]] |volume=60 |issue=309 |pages=63–69 |doi= 10.1080/01621459.1965.10480775|pmid=12261830 }}</ref> is to ask a person to answer the question "Do you own the ''attribute A''~~?", according to the following procedure:~~

+

A simple example, especially developed in the [[social science]]s,<ref name=":7">{{cite journal |last=Warner |first=S. L. |date=March 1965 |title=Randomised response: a survey technique for eliminating evasive answer bias |jstor=2283137 |journal=[[Journal of the American Statistical Association]] |publisher=[[Taylor & Francis]] |volume=60 |issue=309 |pages=63–69 |doi= 10.1080/01621459.1965.10480775|pmid=12261830 }}</ref> is to ask a person to answer the question "Do you own the ''attribute A''?", according to the following procedure:

−

~~A simple example, especially developed in the social sciences, is to ask a person to answer the question "Do you own the attribute A~~?", according to the following procedure:

−

~~一个简单的例子，尤其是在社会科学领域，就是让一个人回答“你拥有属性 a 吗?”？”，根据下列程序~~:

+

一个简单的例子，尤其是在社会科学领域<ref name=":7" />，这就是指让一个人遵从下列程序回答“你拥有属性''A''吗？”:

#[[Coin flipping|Toss a coin]].

第249行：第215行：

#If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.

−

# ~~Toss a coin.~~

+

#抛硬币。

−

~~#If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.~~

+

#如果是正面，再掷硬币（忽略结果），诚实地回答问题。

−

#~~If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.~~

+

#如果是反面，再掷一次硬币，如果是正面，回答“是”; 如果是反面，回答“否”。

−

~~#抛硬币。# 如果是正面，再掷硬币(忽略结果) ，诚实地回答问题。~~# 如果是反面，再掷一次硬币，如果是正面，回答“是”; 如果是反面，回答“否”。

(The seemingly redundant extra toss in the first case is needed in situations where just the ''act'' of tossing a coin may be observed by others, even if the actual result stays hidden.) The confidentiality then arises from the [[Falsifiability|refutability]] of the individual responses.

−

(The seemingly redundant extra toss in the first case is needed in situations where just the act of tossing a coin may be observed by others, even if the actual result stays hidden.) The confidentiality then arises from the refutability of the individual responses.

+

（在第一种情况下，看似多余的额外投掷是必要的，因为在这种情况下，即使实际结果仍然隐藏着，仅仅是抛硬币的动作就可能被其他人观察到。）这种保密性来自于个人反应的可反驳性。

−

(在第一种情况下，看似多余的额外投掷是必要的，因为在这种情况下，即使实际结果仍然隐藏着，仅仅是抛硬币的动作就可能被其他人看到。)这种保密性来自于个人反驳的可驳斥性。

But, overall, these data with many responses are significant, since positive responses are given to a quarter by people who do not have the ''attribute A'' and three-quarters by people who actually possess it.

Thus, if ''p'' is the true proportion of people with ''A'', then we expect to obtain (1/4)(1-''p'') + (3/4)''p'' = (1/4) + ''p''/2 positive responses. Hence it is possible to estimate ''p''.

−

~~But, overall, these data with many responses are significant, since positive responses are given to a quarter by people who do not have the attribute~~ A ~~and three-quarters by people who actually possess it.~~

+

但是，总的来说，这些有很多回答的数据是有意义的，因为有四分之一的人给出了肯定的回答，而有四分之三的人给出了真正拥有''A''属性的人的答案。因此，如果''p''是''A''型人群的真实比例，那么我们期望得到(1/4)(1-''p'') + (3/4)''p'' = (1/4) + ''p''/2的积极反应。因此，我们可以估计''p''。

−

~~Thus, if p is the true proportion of people with A, then we expect to obtain (1/4)(1-p) + (3/4)p = (1/4) + p/2 positive responses. Hence it is possible to estimate p.~~

−

~~但是，总的来说，这些有很多回答的数据是有意义的，因为有四分之一的人给出了肯定的回答，而有四分之三的人给出了真正拥有 a~~ 属性的人的答案。因此，如果 p 是 a 型人群的真实比例，那么我们期望得到(1/4)(1-p) + (3/4) p = (1/4) + p/2的积极反应。因此，我们可以估计 p。

In particular, if the ''attribute A'' is synonymous with illegal behavior, then answering "Yes" is not incriminating, insofar as the person has a probability of a "Yes" response, whatever it may be.

第296行：第255行：

这可以推广到群组隐私，因为群组大小可以被认为是 a 和 b 之间的汉明距离 h (其中 a 包含群组，而 b 没有)。在这种情况下，m circ t 是(epsilon 乘以 c 乘以 h)-微分私有的。

−

~~==Other notions of differential privacy==~~

−

Since differential privacy is considered to be too strong or weak for some applications, many versions of it have been proposed.<ref name="DP19" /> The most widespread relaxation is (ε, δ)-differential privacy,<ref name="DKMMN06" /> which weakens the definition by allowing an additional small δ density of probability on which the upper bound ε does not hold.

−

Since differential privacy is considered to be too strong or weak for some applications, many versions of it have been proposed. The most widespread relaxation is (ε, δ)-differential privacy, which weakens the definition by allowing an additional small δ density of probability on which the upper bound ε does not hold.

−

由于对于某些应用程序来说，差分隐私被认为太强或太弱，因此人们提出了许多版本。最广泛的松弛是(ε，δ)-差分隐私，它通过允许增加一个上限 ε 不成立的概率密度 δ 来削弱定义。

+

{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"

+

|-

+

! 姓名! ！患有糖尿病(x) |

==Adoption of differential privacy in real-world applications==

第564行：第520行：

<small>This page was moved from [[wikipedia:en:Differential privacy]]. Its edit history can be viewed at [[差分隐私/edithistory]]</small></noinclude>

−

~~[[Category:待整理页面]]~~

|}

Litinunispazio97

23

个编辑