第156行: |
第156行: |
| The Laplace mechanism adds Laplace noise (i.e. noise from the [[Laplace distribution]], which can be expressed by probability density function <math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>, which has mean zero and standard deviation <math>\sqrt{2} \lambda\,\!</math>). Now in our case we define the output function of <math>\mathcal{A}\,\!</math> as a real valued function (called as the transcript output by <math>\mathcal{A}\,\!</math>) as <math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math> where <math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math> and <math>f\,\!</math> is the original real valued query/function we planned to execute on the database. Now clearly <math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math> can be considered to be a continuous random variable, where | | The Laplace mechanism adds Laplace noise (i.e. noise from the [[Laplace distribution]], which can be expressed by probability density function <math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>, which has mean zero and standard deviation <math>\sqrt{2} \lambda\,\!</math>). Now in our case we define the output function of <math>\mathcal{A}\,\!</math> as a real valued function (called as the transcript output by <math>\mathcal{A}\,\!</math>) as <math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math> where <math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math> and <math>f\,\!</math> is the original real valued query/function we planned to execute on the database. Now clearly <math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math> can be considered to be a continuous random variable, where |
| | | |
− | <math>\displaystyle{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\displaystyle{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math> | + | <math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math> |
| | | |
− | <nowiki>The Laplace mechanism adds Laplace noise (i.e. noise from the Laplace distribution, which can be expressed by probability density function \text{noise}(y)\propto \exp(-|y|/\lambda)\,\!, which has mean zero and standard deviation \sqrt{2} \lambda\,\!). Now in our case we define the output function of \mathcal{A}\,\! as a real valued function (called as the transcript output by \mathcal{A}\,\!) as \mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\! where Y \sim \text{Lap}(\lambda)\,\!\,\! and f\,\! is the original real valued query/function we planned to execute on the database. Now clearly \mathcal{T}_{\mathcal{A}}(x)\,\! can be considered to be a continuous random variable, where</nowiki>
| |
− |
| |
| | | |
− | <nowiki>拉普拉斯机制增加了拉普拉斯的噪音(即。拉普拉斯分布的噪声,可以用概率密度函数文本{ noise }(y) propto exp (- | y |/lambda)来表示,!这个函数的平均值是0和标准差的 sqrt {2} lambda,!).现在,在我们的例子中,我们定义了数学{ a }的输出函数,!作为一个实值函数(由 mathcal { a } ,!)数学{ t } _ {数学{ a }}(x) = f (x) + y,!其中 y sim 文本{ Lap }(lambda) ,! ,!还有 f!是我们计划在数据库上执行的原始实值查询/函数。现在清楚的数学{ t } _ {数学{ a }(x) ,!可以被认为是一个连续的随机变量</nowiki> | + | 拉普拉斯机制增加了拉普拉斯的噪音(如满足拉普拉斯分布的噪声,其可以用概率密度函数<math>\text{noise}(y)\propto \exp(-|y|/\lambda)\,\!</math>来表示,这个函数的平均值是0,标准差为<math>\sqrt{2} \lambda\,\!</math>)。在我们的例子中,我们定义了输出函数<math>\mathcal{A}\,\!</math>作为一个实值函数(由 <math>\mathcal{A}\,\!</math>调用为文本输出)<math>\mathcal{T}_{\mathcal{A}}(x)=f(x)+Y\,\!</math>,其中<math>Y \sim \text{Lap}(\lambda)\,\!\,\!</math>,且<math>f\,\!</math>是我们计划在数据库上执行的原始实值的查询(或函数)。现在可以确定,<math>\mathcal{T}_{\mathcal{A}}(x)\,\!</math>可以被认为是一个连续的随机变量,当 |
− | | |
− | :<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math>
| |
− |
| |
− | | |
− | :\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!
| |
− |
| |
− | | |
− | : frc { mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 1}(x) = t)}{ mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 2}(x) = t)} = frc { text { noise }(t-f (d _ 1))}{ text { noise }(t-f (d _ 2))} ,!
| |
| | | |
| + | :<math>\frac{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_1}(x)=t)}{\mathrm{pdf}(\mathcal{T}_{\mathcal{A},D_2}(x)=t)}=\frac{\text{noise}(t-f(D_1))}{\text{noise}(t-f(D_2))}\,\!</math><br /> |
| which is at most <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>. We can consider <math>\frac{\Delta(f)}{\lambda}\,\!</math> to be the privacy factor <math>\epsilon\,\!</math>. Thus <math>\mathcal{T}\,\!</math> follows a differentially private mechanism (as can be seen from [[#ε-differential privacy|the definition above]]). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have <math>\mathcal{A}\,\!</math> as the <math>\epsilon\,\!</math>-differential private algorithm we need to have <math>\lambda=1/\epsilon\,\!</math>. Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy.<ref name="Dwork, ICALP 2006" /> | | which is at most <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>. We can consider <math>\frac{\Delta(f)}{\lambda}\,\!</math> to be the privacy factor <math>\epsilon\,\!</math>. Thus <math>\mathcal{T}\,\!</math> follows a differentially private mechanism (as can be seen from [[#ε-differential privacy|the definition above]]). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have <math>\mathcal{A}\,\!</math> as the <math>\epsilon\,\!</math>-differential private algorithm we need to have <math>\lambda=1/\epsilon\,\!</math>. Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy.<ref name="Dwork, ICALP 2006" /> |
| | | |
− | <nowiki>which is at most e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!. We can consider \frac{\Delta(f)}{\lambda}\,\! to be the privacy factor \epsilon\,\!. Thus \mathcal{T}\,\! follows a differentially private mechanism (as can be seen from the definition above). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have \mathcal{A}\,\! as the \epsilon\,\!-differential private algorithm we need to have \lambda=1/\epsilon\,\!. Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy.</nowiki> | + | 其最多为 <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>。我们可以设<math>\frac{\Delta(f)}{\lambda}\,\!</math>为隐私因子<math>\epsilon\,\!</math>。因此,<math>\mathcal{T}\,\!</math>满足差分隐私机制的规则(从[[#ε-differential privacy|上面的定义]]可以看出)。如果尝试在我们的糖尿病例子中使用这个概念,那么从上述事实可以推出,为了使用<math>\mathcal{A}\,\!</math>作为<math>\epsilon\,\!</math>-差分隐私算法,我们需要满足<math>\lambda=1/\epsilon\,\!</math>。虽然我们在这里使用了拉普拉斯噪音,但是其他形式的噪音,如高斯噪音也是可使用的——但是可能需要稍微放宽差分隐私的定义。<ref name="Dwork, ICALP 2006" /> |
− | | |
− | 最多 e ^ { frac { | f (d _ {1})-f (d _ {2}) | }{ lambda } leq e ^ { frac { Delta (f)}{ lambda } ,。我们可以考虑 frac { Delta (f)}{ lambda } ,!成为隐私保护因素。因此数学{ t } ,!遵循不同的私有机制(从上面的定义可以看出)。如果我们在我们的糖尿病示例中尝试使用这个概念,那么从上面的派生事实可以推出,为了得到数学{ a } ,!作为 epsilon!-我们需要 lambda = 1/epsilon! 。虽然我们在这里使用了拉普拉斯噪音,但是也可以使用其他形式的噪音,比如高斯噪音,但是它们可能需要稍微放宽差分隐私的定义。
| |
| | | |
| According to this definition, differential privacy is a condition on the release mechanism (i.e., the trusted party releasing information ''about'' the dataset) and not on the dataset itself. Intuitively, this means that for any two datasets that are similar, a given differentially private algorithm will behave approximately the same on both datasets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the algorithm significantly. | | According to this definition, differential privacy is a condition on the release mechanism (i.e., the trusted party releasing information ''about'' the dataset) and not on the dataset itself. Intuitively, this means that for any two datasets that are similar, a given differentially private algorithm will behave approximately the same on both datasets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the algorithm significantly. |
| | | |
− | According to this definition, differential privacy is a condition on the release mechanism (i.e., the trusted party releasing information about the dataset) and not on the dataset itself. Intuitively, this means that for any two datasets that are similar, a given differentially private algorithm will behave approximately the same on both datasets. The definition gives a strong guarantee that presence or absence of an individual will not affect the final output of the algorithm significantly.
| + | 根据这个定义,差分隐私是发布机制的一个条件(例如,由可信方发布数据集的信息),而不是数据集本身。直观地说,这意味着对于任何两个相似的数据集,一个给定的差分隐私算法在两个数据集上的表现将大致相同。该定义强有力地保证了个体的存在与否不会对算法的最终输出产生重大影响。 |
− | | |
− | 根据这个定义,差分隐私是发布机制的一个条件(例如,可信方发布数据集的信息) ,而不是数据集本身。直观上,这意味着对于任何两个相似的数据集,一个给定的差异私有算法在两个数据集上的行为大致相同。该定义强有力地保证了个体的存在或不存在不会对算法的最终输出产生重大影响。
| |
| | | |
| For example, assume we have a database of medical records <math>D_1</math> where each record is a pair ('''Name''', '''X'''), where <math>X</math> is a [[Boolean algebra|Boolean]] denoting whether a person has diabetes or not. For example: | | For example, assume we have a database of medical records <math>D_1</math> where each record is a pair ('''Name''', '''X'''), where <math>X</math> is a [[Boolean algebra|Boolean]] denoting whether a person has diabetes or not. For example: |
| | | |
− | For example, assume we have a database of medical records D_1 where each record is a pair (Name, X), where X is a Boolean denoting whether a person has diabetes or not. For example:
| + | 例如,假设我们有一个关于医疗记录的数据库<math>D_1</math>,其中每个记录是一对('''Name''', '''X'''),其中<math>X</math>是一个<font color="#ff8000">布尔值Boolean Algebra</font>,表示一个人是否患有糖尿病。例如: |
− | | |
− | 例如,假设我们有一个医疗记录数据库 d _ 1,其中每个记录是一对(Name,x) ,其中 x 是一个布尔值,表示一个人是否患有糖尿病。例如:
| |
| | | |
| {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" | | {| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;" |