更改

删除5,106字节 、 2021年11月9日 (二) 00:39
edition on 08/11/2021
第3行: 第3行:  
'''Differential privacy''' is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a [[statistical database]] which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring [[confidentiality]] of survey responses, and [[#Adoption of differential privacy in real-world applications|by companies]] to collect information about user behavior while controlling what is visible even to internal analysts.
 
'''Differential privacy''' is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a [[statistical database]] which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring [[confidentiality]] of survey responses, and [[#Adoption of differential privacy in real-world applications|by companies]] to collect information about user behavior while controlling what is visible even to internal analysts.
   −
Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset. The idea behind differential privacy is that if the effect of making an arbitrary single substitution in the database is small enough, the query result cannot be used to infer much about any single individual, and therefore provides privacy. Another way to describe differential privacy is as a constraint on the algorithms used to publish aggregate information about a statistical database which limits the disclosure of private information of records whose information is in the database. For example, differentially private algorithms are used by some government agencies to publish demographic information or other statistical aggregates while ensuring confidentiality of survey responses, and by companies to collect information about user behavior while controlling what is visible even to internal analysts.
+
差分隐私是一个用于公开分享数据集信息的系统,它在描述数据集中的群体特征的同时保护了数据集中的个人信息。差分隐私的理念是,如果在数据库中进行任意单次更迭的影响足够小,那么查询结果就不能用于推断任何单一个体的大量信息,因此个体的隐私得以保证。另一种对于差分隐私的描述表示,这是针对发布<font color="#ff8000">统计数据库Statistical Database</font>的汇集信息的算法的约束条件,这种算法限制了数据库中信息的记录的个人信息的披露。例如,一些政府机构使用差分隐私算法公布人口信息或其他统计数据,同时确保调查结果的<font color="#ff8000">保密性Confidentiality</font>;而公司则使用该算法收集用户行为信息,同时控制哪些信息对于内部分析人员是可见的。
 
  −
差分隐私是一个用于公开分享数据集信息的系统,它在描述数据集中的群体特征的同时保护了数据集中的个人信息。差分隐私的理念是,如果在数据库中进行任意单次更迭的影响足够小,那么查询结果就不能用于推断任何单一个体的大量信息,因此个体的隐私得以保证。另一种对于差分隐私的描述表示,这是针对发布<font color="#ff8000">统计数据库Statistical Database</font>的聚合信息的算法的约束条件,这种算法限制了数据库中信息的记录的个人信息的披露。例如,一些政府机构使用差分隐私算法公布人口信息或其他统计数据,同时确保调查结果的<font color="#ff8000">保密性Confidentiality</font>;而公司则使用该算法收集用户行为信息,同时控制哪些信息对于内部分析人员是可见的。
      
Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation.  
 
Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation.  
 
Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and [[Data re-identification|reidentification]] attacks, differentially private algorithms probably resist such attacks.<ref name="DMNS06" />
 
Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and [[Data re-identification|reidentification]] attacks, differentially private algorithms probably resist such attacks.<ref name="DMNS06" />
  −
Roughly, an algorithm is differentially private if an observer seeing its output cannot tell if a particular individual's information was used in the computation.
  −
Differential privacy is often discussed in the context of identifying individuals whose information may be in a database. Although it does not directly refer to identification and reidentification attacks, differentially private algorithms probably resist such attacks.
      
简单来说,如果观察者发现某算法的输出不能推断出一个特定个体的信息是否被用于其计算,则该算法是差分隐私的。通常,差分隐私算法会在识别其信息可能存在于数据库中的个体时被讨论。虽然差分隐私算法不直接涉及身份识别和<font color="#ff8000">身份重识别Reidentification</font>攻击,但它或许能够防御这些攻击。<ref name="DMNS06" />
 
简单来说,如果观察者发现某算法的输出不能推断出一个特定个体的信息是否被用于其计算,则该算法是差分隐私的。通常,差分隐私算法会在识别其信息可能存在于数据库中的个体时被讨论。虽然差分隐私算法不直接涉及身份识别和<font color="#ff8000">身份重识别Reidentification</font>攻击,但它或许能够防御这些攻击。<ref name="DMNS06" />
    
Differential privacy was developed by [[Cryptography|cryptographers]] and thus is often associated with cryptography, and draws much of its language from cryptography.
 
Differential privacy was developed by [[Cryptography|cryptographers]] and thus is often associated with cryptography, and draws much of its language from cryptography.
  −
Differential privacy was developed by cryptographers and thus is often associated with cryptography, and draws much of its language from cryptography.
      
差分隐私是由密码学家开发的,因此经常与<font color="#ff8000">密码学Cryptography</font>相关,且其大量内容来自于密码学。
 
差分隐私是由密码学家开发的,因此经常与<font color="#ff8000">密码学Cryptography</font>相关,且其大量内容来自于密码学。
第24行: 第17行:  
Official statistics organizations are charged with collecting information from individuals or establishments, and publishing aggregate data to serve the public interest.  For example, the [[1790 United States Census]] collected information about individuals living in the United States and published tabulations based on sex, age, race, and condition of servitude. Statistical organizations have long collected information under a promise of [[confidentiality]] that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.
 
Official statistics organizations are charged with collecting information from individuals or establishments, and publishing aggregate data to serve the public interest.  For example, the [[1790 United States Census]] collected information about individuals living in the United States and published tabulations based on sex, age, race, and condition of servitude. Statistical organizations have long collected information under a promise of [[confidentiality]] that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.
   −
Official statistics organizations are charged with collecting information from individuals or establishments, and publishing aggregate data to serve the public interest.  For example, the 1790 United States Census collected information about individuals living in the United States and published tabulations based on sex, age, race, and condition of servitude. Statistical organizations have long collected information under a promise of confidentiality that the information provided will be used for statistical purposes, but that the publications will not produce information that can be traced back to a specific individual or establishment. To accomplish this goal, statistical organizations have long suppressed information in their publications. For example, in a table presenting the sales of each business in a town grouped by business category, a cell that has information from only one company might be suppressed, in order to maintain the confidentiality of that company's specific sales.
+
官方统计机构负责收集个人或机构的信息并公布汇集数据,以服务公众利益。例如,1790年的美国人口普查收集了生活在美国的个人的信息,并公布了基于性别、年龄、种族和奴役情况的表格。统计机构长期保证收集信息的<font color="#ff8000">保密性Confidentiality</font>,即所提供的信息将用于统计目的,但出版物不会产生可追溯到具体个人或机构的信息。为了实现这一目标,统计机构长期以来一直在其出版物中压缩信息。例如,在按产业类别表示某个城镇中企业的销售情况的表格中,只有一家公司信息的单元格可能会被隐藏,从而对该公司的具体销售情况保密。
 
  −
官方统计机构负责从个人或机构收集信息,并公布汇总数据,以服务于公众利益。例如,1790年美国人口普查收集了生活在美国的个人信息,并公布了基于性别、年龄、种族和奴役状况的表格。统计组织长期以来一直在保密的前提下收集信息,即所提供的信息将用于统计目的,但出版物不会提供可追溯到具体个人或机构的信息。为了实现这一目标,统计机构长期以来一直在其出版物中压制信息。例如,在按业务类别列出某个城镇中每家企业的销售情况的表格中,可能会取消只有一家公司信息的单元格,以便对该公司的具体销售情况保密。
      
The adoption of electronic information processing systems by statistical agencies in the 1950s and 1960s dramatically increased the number of tables that a statistical organization could produce and, in so doing, significantly increased the potential for an improper disclosure of confidential information. For example, if a business that had its sales numbers suppressed also had those numbers appear in the total sales of a region, then it might be possible to determine the suppressed value by subtracting the other sales from that total. But there might also be combinations of additions and subtractions that might cause the private information to be revealed. The number of combinations that needed to be checked increases exponentially with the number of publications, and it is potentially unbounded if data users are able to make  queries of the statistical database using an interactive query system.
 
The adoption of electronic information processing systems by statistical agencies in the 1950s and 1960s dramatically increased the number of tables that a statistical organization could produce and, in so doing, significantly increased the potential for an improper disclosure of confidential information. For example, if a business that had its sales numbers suppressed also had those numbers appear in the total sales of a region, then it might be possible to determine the suppressed value by subtracting the other sales from that total. But there might also be combinations of additions and subtractions that might cause the private information to be revealed. The number of combinations that needed to be checked increases exponentially with the number of publications, and it is potentially unbounded if data users are able to make  queries of the statistical database using an interactive query system.
   −
The adoption of electronic information processing systems by statistical agencies in the 1950s and 1960s dramatically increased the number of tables that a statistical organization could produce and, in so doing, significantly increased the potential for an improper disclosure of confidential information. For example, if a business that had its sales numbers suppressed also had those numbers appear in the total sales of a region, then it might be possible to determine the suppressed value by subtracting the other sales from that total. But there might also be combinations of additions and subtractions that might cause the private information to be revealed. The number of combinations that needed to be checked increases exponentially with the number of publications, and it is potentially unbounded if data users are able to make  queries of the statistical database using an interactive query system.
+
1950年代和1960年代,统计机构采用了电子信息处理系统,这大量增加了统计机构可产出的表格数量,从而显著提高了不当暴露机密信息的潜在可能性。例如,如果一个企业的销售数字被隐藏,这些数字也被统计在该地区的总销售额中,那么可能从总销售额中减去其他销售额即可确定被隐藏的数据。或者也有可能通过增法和减法的结合,导致隐私信息被泄露。随着出版物数量的增加,需要检查的组合数量呈指数增长,甚至没有上限——如果数据的用户能够使用交互式查询系统对统计数据库进行查询。
   −
1950年代和1960年代,统计机构采用了电子信息处理系统,大大增加了统计机构可以编制的表格数量,从而大大增加了不当披露机密信息的可能性。例如,如果一个企业的销售数字被抑制,这些数字也出现在一个地区的总销售额中,那么可以通过从总销售额中减去其他销售额来确定被抑制的价值。但也可能有增减的组合,这可能导致私人信息被披露。随着出版物数量的增加,需要检查的组合数量呈指数增长,如果数据用户能够使用交互式查询系统对统计数据库进行查询,则可能无限制。
+
In 1977, Tore Dalenius formalized the mathematics of [[cell suppression]].<ref name=":0">{{cite journal|author=Dalenius|first=Tore|year=1977|title=Towards a methodology for statistical disclosure control|url=https://archives.vrdc.cornell.edu/info7470/2011/Readings/dalenius-1977.pdf|journal=Statistik Tidskrift|volume=15}}</ref>
   −
In 1977, Tore Dalenius formalized the mathematics of cell suppression.<ref>{{cite journal|author=Dalenius|first=Tore|year=1977|title=Towards a methodology for statistical disclosure control|url=https://archives.vrdc.cornell.edu/info7470/2011/Readings/dalenius-1977.pdf|journal=Statistik Tidskrift|volume=15}}</ref>
+
1977年,托雷 · 达伦纽斯规范了<font color="#ff8000">单元抑制Cell Suppression</font>的数学模型。<ref name=":0" />
   −
In 1977, Tore Dalenius formalized the mathematics of cell suppression.
+
In 1979, [[Dorothy Denning]], [[Peter J. Denning]] and Mayer D. Schwartz formalized the concept of a Tracker, an adversary that could learn the confidential contents of a statistical database by creating a series of targeted queries and remembering the results.<ref name=":1">{{cite journal|author=Dorothy E. Denning|author2=Peter J. Denning|author3=Mayer D. Schwartz|title=The Tracker: A Threat to Statistical Database Security|date=March 1978|url=http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2011/VL_Privacy/Tracker1.pdf|volume=4|number=1|pages=76–96}}</ref> This and future research showed that privacy properties in a database could only be preserved by considering each new query in light of (possibly all) previous queries. This line of work is sometimes called ''query privacy,'' with the final result being that tracking the impact of a query on the privacy of individuals in the database was NP-hard.
   −
1977年,托雷 · 达伦纽斯正式确立了细胞抑制的数学模型。
+
1979年,多萝西 · 丹宁(Dorothy Denning)、彼得 · j · 丹宁(Peter j. Denning)和迈耶 · d · 施瓦茨(Mayer d. Schwartz)正式提出了“跟踪器”(Tracker)的概念,这个对手可以通过创建一系列有针对性的查询并储存查询结果,来分析获取统计数据库的机密内容。<ref name=":1" />这项研究和包括之后的研究表明,只有根据(可能是所有)以前的查询来考虑每个新的查询,才能保证数据库的隐私属性。这种工作有时被称为查询隐私,最终的结果是,追溯某查询对于数据库中个人隐私造成的影响是 NP-hard问题。
   −
In 1979, [[Dorothy Denning]], [[Peter J. Denning]] and Mayer D. Schwartz formalized the concept of a Tracker, an adversary that could learn the confidential contents of a statistical database by creating a series of targeted queries and remembering the results.<ref>{{cite journal|author=Dorothy E. Denning|author2=Peter J. Denning|author3=Mayer D. Schwartz|title=The Tracker: A Threat to Statistical Database Security|date=March 1978|url=http://www.dbis.informatik.hu-berlin.de/fileadmin/lectures/SS2011/VL_Privacy/Tracker1.pdf|volume=4|number=1|pages=76–96}}</ref> This and future research showed that privacy properties in a database could only be preserved by considering each new query in light of (possibly all) previous queries. This line of work is sometimes called ''query privacy,'' with the final result being that tracking the impact of a query on the privacy of individuals in the database was NP-hard.
+
In 2003, [[Kobbi Nissim]] and [[Irit Dinur]] demonstrated that it is impossible to publish arbitrary queries on a private statistical database without revealing some amount of private information, and that the entire information content of the database can be revealed by publishing the results of a surprisingly small number of random queries—far fewer than was implied by previous work.<ref name=":2">Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210. {{doi|10.1145/773153.773173}}</ref> The general phenomenon is known as the [[Reconstruction attack|Fundamental Law of Information Recovery]], and its key insight, namely that in the most general case, privacy cannot be protected without injecting some amount of noise, led to development of differential privacy.
   −
In 1979, Dorothy Denning, Peter J. Denning and Mayer D. Schwartz formalized the concept of a Tracker, an adversary that could learn the confidential contents of a statistical database by creating a series of targeted queries and remembering the results. This and future research showed that privacy properties in a database could only be preserved by considering each new query in light of (possibly all) previous queries. This line of work is sometimes called query privacy, with the final result being that tracking the impact of a query on the privacy of individuals in the database was NP-hard.
+
2003年,Kobbi Nissim和Irit Dinur证明,在私人统计数据库上发布任意查询而不泄露一些私人信息是不可能的,而且通过公开少得惊人的随机查询的结果就可能会显示数据库的全部信息内容——远远少于以往研究的推断。<ref name=":2" />这种现象被称为信息恢复基本定律,其核心观点是,在最普遍的情况下,隐私无法在不加入一定量的噪音的情况下得到保护,这导致了差分隐私的发展。
 
  −
1979年,多萝西 · 丹宁(Dorothy Denning)、彼得 · j · 丹宁(Peter j. Denning)和迈耶 · d · 施瓦茨(Mayer d. Schwartz)正式提出了“跟踪器”(Tracker)的概念。这个对手可以通过创建一系列有针对性的查询并记住查询结果,来了解统计数据库的机密内容。这项研究和未来的研究表明,数据库中的隐私属性只能通过根据(可能是所有的)以前的查询来考虑每个新的查询来保护。这种工作有时被称为查询隐私,最终的结果是跟踪查询对数据库中个人隐私的影响是 np 难的。
  −
 
  −
In 2003, [[Kobbi Nissim]] and [[Irit Dinur]] demonstrated that it is impossible to publish arbitrary queries on a private statistical database without revealing some amount of private information, and that the entire information content of the database can be revealed by publishing the results of a surprisingly small number of random queries—far fewer than was implied by previous work.<ref>Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210. {{doi|10.1145/773153.773173}}</ref> The general phenomenon is known as the [[Reconstruction attack|Fundamental Law of Information Recovery]], and its key insight, namely that in the most general case, privacy cannot be protected without injecting some amount of noise, led to development of differential privacy.
  −
 
  −
In 2003, Kobbi Nissim and Irit Dinur demonstrated that it is impossible to publish arbitrary queries on a private statistical database without revealing some amount of private information, and that the entire information content of the database can be revealed by publishing the results of a surprisingly small number of random queries—far fewer than was implied by previous work.Irit Dinur and Kobbi Nissim. 2003. Revealing information while preserving privacy. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS '03). ACM, New York, NY, USA, 202–210.  The general phenomenon is known as the Fundamental Law of Information Recovery, and its key insight, namely that in the most general case, privacy cannot be protected without injecting some amount of noise, led to development of differential privacy.
  −
 
  −
2003年,Kobbi Nissim 和 Irit Dinur 证明,在私人统计数据库上发布任意查询而不披露一些私人信息是不可能的,而且通过发布少得惊人的随机查询的结果就可以显示数据库的全部信息内容ー远远少于以前的工作所暗示的内容。伊里特 · 迪努尔和小布 · 尼辛。2003.在保护隐私的同时披露信息。在第二十二届 ACM SIGMOD-SIGACT-SIGART 数据库系统原理研讨会会议录(PODS’03)。ACM,纽约,纽约,美国,202-210。这种普遍现象被称为信息恢复的基本法则,其核心观点是,在最普遍的情况下,如果不注入一些噪音,隐私就无法得到保护,这导致了差分隐私的发展。
      
In 2006, [[Cynthia Dwork]], [[Frank McSherry]], [[Kobbi Nissim]] and [[Adam D. Smith]] published an article formalizing the amount of noise that needed to be added and proposing a generalized mechanism for doing so.<ref name="DMNS06" /> Their work was a co-recipient of the 2016 TCC Test-of-Time Award<ref>{{cite web |title=TCC Test-of-Time Award |url=https://www.iacr.org/workshops/tcc/awards.html}}</ref> and the 2017 [[Gödel Prize]].<ref>{{cite web |title=2017 Gödel Prize |url=https://www.eatcs.org/index.php/component/content/article/1-news/2450-2017-godel-prize}}</ref>
 
In 2006, [[Cynthia Dwork]], [[Frank McSherry]], [[Kobbi Nissim]] and [[Adam D. Smith]] published an article formalizing the amount of noise that needed to be added and proposing a generalized mechanism for doing so.<ref name="DMNS06" /> Their work was a co-recipient of the 2016 TCC Test-of-Time Award<ref>{{cite web |title=TCC Test-of-Time Award |url=https://www.iacr.org/workshops/tcc/awards.html}}</ref> and the 2017 [[Gödel Prize]].<ref>{{cite web |title=2017 Gödel Prize |url=https://www.eatcs.org/index.php/component/content/article/1-news/2450-2017-godel-prize}}</ref>
第144行: 第127行:  
总之,可组合性和对后期处理的健壮性允许模块化构建和分析不同的私有机制,并激励隐私损失预算的概念。如果访问复杂机制的敏感数据的所有元素都是单独的、不同的私有元素,那么它们的组合也是如此,然后是任意的后处理。
 
总之,可组合性和对后期处理的健壮性允许模块化构建和分析不同的私有机制,并激励隐私损失预算的概念。如果访问复杂机制的敏感数据的所有元素都是单独的、不同的私有元素,那么它们的组合也是如此,然后是任意的后处理。
   −
===Group privacy ===
+
===Group privacy===
 
In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if '''one''' particular participant submitted his information. However this is also extendable if we want to protect databases differing in <math>c</math> rows, which amounts to adversary with arbitrary auxiliary information can know if '''<math>c</math>''' particular participants submitted their information. This can be achieved because if <math>c</math> items change, the probability dilation is bounded by <math>\exp ( \epsilon c )</math> instead of <math>\exp ( \epsilon )</math>,'''<ref name="Dwork, ICALP 2006" />''' i.e., for D<sub>1</sub> and D<sub>2</sub> differing on <math>c</math> items:
 
In general, ε-differential privacy is designed to protect the privacy between neighboring databases which differ only in one row. This means that no adversary with arbitrary auxiliary information can know if '''one''' particular participant submitted his information. However this is also extendable if we want to protect databases differing in <math>c</math> rows, which amounts to adversary with arbitrary auxiliary information can know if '''<math>c</math>''' particular participants submitted their information. This can be achieved because if <math>c</math> items change, the probability dilation is bounded by <math>\exp ( \epsilon c )</math> instead of <math>\exp ( \epsilon )</math>,'''<ref name="Dwork, ICALP 2006" />''' i.e., for D<sub>1</sub> and D<sub>2</sub> differing on <math>c</math> items:
   第154行: 第137行:  
\exp(\epsilon c)\cdot\Pr[\mathcal{A}(D_{2})\in S]\,\!</math>
 
\exp(\epsilon c)\cdot\Pr[\mathcal{A}(D_{2})\in S]\,\!</math>
   −
: \Pr[\mathcal{A}(D_{1})\in S]\leq
+
:\Pr[\mathcal{A}(D_{1})\in S]\leq
 
\exp(\epsilon c)\cdot\Pr[\mathcal{A}(D_{2})\in S]\,\!
 
\exp(\epsilon c)\cdot\Pr[\mathcal{A}(D_{2})\in S]\,\!
   第165行: 第148行:  
因此,将 ε 设置为 epsilon/c 可以达到预期的结果(c 项的保护)。换句话说,取代了每个条目 ε- 差别私有保护,现在每组 c 条目都是 ε- 差别私有保护(每个条目 ε/c)-差别私有保护)。
 
因此,将 ε 设置为 epsilon/c 可以达到预期的结果(c 项的保护)。换句话说,取代了每个条目 ε- 差别私有保护,现在每组 c 条目都是 ε- 差别私有保护(每个条目 ε/c)-差别私有保护)。
   −
==ε-differentially private mechanisms==
+
== ε-differentially private mechanisms==
 
Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the [[Exponential mechanism (differential privacy)|exponential mechanism]]<ref>[http://research.microsoft.com/pubs/65075/mdviadp.pdf F.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007.]</ref> and posterior sampling<ref>[https://arxiv.org/abs/1306.1066 Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014]</ref> sample from a problem-dependent family of distributions instead.
 
Since differential privacy is a probabilistic concept, any differentially private mechanism is necessarily randomized. Some of these, like the Laplace mechanism, described below, rely on adding controlled noise to the function that we want to compute. Others, like the [[Exponential mechanism (differential privacy)|exponential mechanism]]<ref>[http://research.microsoft.com/pubs/65075/mdviadp.pdf F.McSherry and K.Talwar. Mechasim Design via Differential Privacy. Proceedings of the 48th Annual Symposium of Foundations of Computer Science, 2007.]</ref> and posterior sampling<ref>[https://arxiv.org/abs/1306.1066 Christos Dimitrakakis, Blaine Nelson, Aikaterini Mitrokotsa, Benjamin Rubinstein. Robust and Private Bayesian Inference. Algorithmic Learning Theory 2014]</ref> sample from a problem-dependent family of distributions instead.
   第212行: 第195行:  
 
   −
: frc { mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 1}(x) = t)}{ mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 2}(x) = t)} = frc { text { noise }(t-f (d _ 1))}{ text { noise }(t-f (d _ 2))} ,!
+
:frc { mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 1}(x) = t)}{ mathrm { pdf }(mathcal { t } _ { mathcal { a } ,d _ 2}(x) = t)} = frc { text { noise }(t-f (d _ 1))}{ text { noise }(t-f (d _ 2))} ,!
    
which is at most <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>. We can consider <math>\frac{\Delta(f)}{\lambda}\,\!</math> to be the privacy factor <math>\epsilon\,\!</math>. Thus <math>\mathcal{T}\,\!</math> follows a differentially private mechanism (as can be seen from [[#&epsilon;-differential privacy|the definition above]]). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have <math>\mathcal{A}\,\!</math> as the <math>\epsilon\,\!</math>-differential private algorithm we need to have <math>\lambda=1/\epsilon\,\!</math>. Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy.<ref name="Dwork, ICALP 2006" />
 
which is at most <math>e^{\frac{|f(D_{1})-f(D_{2})|}{\lambda}}\leq e^{\frac{\Delta(f)}{\lambda}}\,\!</math>. We can consider <math>\frac{\Delta(f)}{\lambda}\,\!</math> to be the privacy factor <math>\epsilon\,\!</math>. Thus <math>\mathcal{T}\,\!</math> follows a differentially private mechanism (as can be seen from [[#&epsilon;-differential privacy|the definition above]]). If we try to use this concept in our diabetes example then it follows from the above derived fact that in order to have <math>\mathcal{A}\,\!</math> as the <math>\epsilon\,\!</math>-differential private algorithm we need to have <math>\lambda=1/\epsilon\,\!</math>. Though we have used Laplace noise here, other forms of noise, such as the Gaussian Noise, can be employed, but they may require a slight relaxation of the definition of differential privacy.<ref name="Dwork, ICALP 2006" />
第236行: 第219行:  
!Name!!Has Diabetes (X)
 
!Name!!Has Diabetes (X)
 
|-
 
|-
|Ross
+
| Ross
 
||1
 
||1
 
|-
 
|-
第252行: 第235行:  
|-
 
|-
 
|Rachel
 
|Rachel
||0
+
|| 0
 
|}
 
|}
    
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
 
{| class="wikitable" style="margin-left: auto; margin-right: auto; border: none;"
 
|-
 
|-
!Name!!Has Diabetes (X)
+
! Name!!Has Diabetes (X)
 
|-
 
|-
 
|Ross
 
|Ross
 
||1
 
||1
 
|-
 
|-
|Monica
+
| Monica
 
||1
 
||1
 
|-
 
|-
第272行: 第255行:  
|-
 
|-
 
|Chandler
 
|Chandler
|| 1
+
||1
 
|-
 
|-
| Rachel
+
|Rachel
|| 0
+
||0
 
|}
 
|}
   第304行: 第287行:     
#[[Coin flipping|Toss a coin]].
 
#[[Coin flipping|Toss a coin]].
# If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.
+
#If heads, then toss the coin again (ignoring the outcome), and answer the question honestly.
#If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.
+
# If tails, then toss the coin again and answer "Yes" if heads, "No" if tails.
    
#Toss a coin.
 
#Toss a coin.
第361行: 第344行:  
由于对于某些应用程序来说,差分隐私被认为太强或太弱,因此人们提出了许多版本。最广泛的松弛是(ε,δ)-差分隐私,它通过允许增加一个上限 ε 不成立的概率密度 δ 来削弱定义。
 
由于对于某些应用程序来说,差分隐私被认为太强或太弱,因此人们提出了许多版本。最广泛的松弛是(ε,δ)-差分隐私,它通过允许增加一个上限 ε 不成立的概率密度 δ 来削弱定义。
   −
==Adoption of differential privacy in real-world applications ==
+
==Adoption of differential privacy in real-world applications==
 
{{see also|Implementations of differentially private analyses}}
 
{{see also|Implementations of differentially private analyses}}
 
Several uses of differential privacy in practice are known to date:
 
Several uses of differential privacy in practice are known to date:
第368行: 第351行:  
*2015: Google, for sharing historical traffic statistics.<ref name="Eland" />
 
*2015: Google, for sharing historical traffic statistics.<ref name="Eland" />
 
*2016: [[Apple Inc.|Apple]] announced its intention to use differential privacy in [[iOS 10]] to improve its [[Intelligent personal assistant]] technology.<ref>{{cite web|title=Apple - Press Info - Apple Previews iOS 10, the Biggest iOS Release Ever|url=https://www.apple.com/pr/library/2016/06/13Apple-Previews-iOS-10-The-Biggest-iOS-Release-Ever.html|website=Apple|access-date=16 June 2016}}</ref>
 
*2016: [[Apple Inc.|Apple]] announced its intention to use differential privacy in [[iOS 10]] to improve its [[Intelligent personal assistant]] technology.<ref>{{cite web|title=Apple - Press Info - Apple Previews iOS 10, the Biggest iOS Release Ever|url=https://www.apple.com/pr/library/2016/06/13Apple-Previews-iOS-10-The-Biggest-iOS-Release-Ever.html|website=Apple|access-date=16 June 2016}}</ref>
*2017: Microsoft, for telemetry in Windows.<ref name="DpWinTelemetry" />
+
* 2017: Microsoft, for telemetry in Windows.<ref name="DpWinTelemetry" />
 
*2019: Privitar Lens is an API using differential privacy.<ref>{{cite web|title=Privitar Lens|url=https://www.privitar.com/privitar-lens|access-date=20 February 2018}}</ref>
 
*2019: Privitar Lens is an API using differential privacy.<ref>{{cite web|title=Privitar Lens|url=https://www.privitar.com/privitar-lens|access-date=20 February 2018}}</ref>
 
*2020: LinkedIn, for advertiser queries.<ref name="DpLinkedIn" />
 
*2020: LinkedIn, for advertiser queries.<ref name="DpLinkedIn" />
第376行: 第359行:  
*2008: U.S. Census Bureau, for showing commuting patterns.
 
*2008: U.S. Census Bureau, for showing commuting patterns.
 
*2014: Google's RAPPOR, for telemetry such as learning statistics about unwanted software hijacking users' settings.
 
*2014: Google's RAPPOR, for telemetry such as learning statistics about unwanted software hijacking users' settings.
* 2015: Google, for sharing historical traffic statistics.
+
*2015: Google, for sharing historical traffic statistics.
 
*2016: Apple announced its intention to use differential privacy in iOS 10 to improve its Intelligent personal assistant technology.
 
*2016: Apple announced its intention to use differential privacy in iOS 10 to improve its Intelligent personal assistant technology.
 
*2017: Microsoft, for telemetry in Windows.
 
*2017: Microsoft, for telemetry in Windows.
*2019: Privitar Lens is an API using differential privacy.
+
* 2019: Privitar Lens is an API using differential privacy.
 
*2020: LinkedIn, for advertiser queries.
 
*2020: LinkedIn, for advertiser queries.
   第419行: 第402行:  
*Exponential mechanism (differential privacy) – a technique for designing differentially private algorithms
 
*Exponential mechanism (differential privacy) – a technique for designing differentially private algorithms
 
*k-anonymity
 
*k-anonymity
*Differentially private analysis of graphs
+
* Differentially private analysis of graphs
 
*Protected health information
 
*Protected health information
   第426行: 第409行:  
*指数机制(差分隐私)-一种设计不同私有算法的技术
 
*指数机制(差分隐私)-一种设计不同私有算法的技术
 
*k-匿名
 
*k-匿名
* 图的不同私有分析
+
*图的不同私有分析
 
*受保护的健康信息
 
*受保护的健康信息
   −
==References ==
+
==References==
 
{{Reflist|refs=
 
{{Reflist|refs=
 
<ref name="DKMMN06">
 
<ref name="DKMMN06">
第540行: 第523行:  
*Machanavajjhala, Ashwin, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory Meets Practice on the Map, International Conference on Data Engineering (ICDE) 2008: 277-286, {{doi|10.1109/ICDE.2008.4497436}}.
 
*Machanavajjhala, Ashwin, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory Meets Practice on the Map, International Conference on Data Engineering (ICDE) 2008: 277-286, {{doi|10.1109/ICDE.2008.4497436}}.
 
*Dwork, Cynthia and Moni Naor. 2010. On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy, Journal of Privacy and Confidentiality: Vol. 2: Iss. 1, Article 8. Available at: http://repository.cmu.edu/jpc/vol2/iss1/8.
 
*Dwork, Cynthia and Moni Naor. 2010. On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy, Journal of Privacy and Confidentiality: Vol. 2: Iss. 1, Article 8. Available at: http://repository.cmu.edu/jpc/vol2/iss1/8.
* Kifer, Daniel and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 193-204. {{doi|10.1145/1989323.1989345}}.
+
*Kifer, Daniel and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 193-204. {{doi|10.1145/1989323.1989345}}.
 
*Erlingsson, Úlfar, Vasyl Pihur and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1054-1067. {{doi|10.1145/2660267.2660348}}.
 
*Erlingsson, Úlfar, Vasyl Pihur and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1054-1067. {{doi|10.1145/2660267.2660348}}.
 
*Abowd, John M. and Ian M. Schmutte. 2017 . Revisiting the economics of privacy: Population statistics and confidentiality protection as public goods. Labor Dynamics Institute, Cornell University, Labor Dynamics Institute, Cornell University, at https://digitalcommons.ilr.cornell.edu/ldi/37/
 
*Abowd, John M. and Ian M. Schmutte. 2017 . Revisiting the economics of privacy: Population statistics and confidentiality protection as public goods. Labor Dynamics Institute, Cornell University, Labor Dynamics Institute, Cornell University, at https://digitalcommons.ilr.cornell.edu/ldi/37/
第555行: 第538行:  
*Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. in Halevi, S. & Rabin, T. (Eds.) Calibrating Noise to Sensitivity in Private Data Analysis Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4–7, 2006. Proceedings, Springer Berlin Heidelberg, 265-284, .
 
*Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. in Halevi, S. & Rabin, T. (Eds.) Calibrating Noise to Sensitivity in Private Data Analysis Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4–7, 2006. Proceedings, Springer Berlin Heidelberg, 265-284, .
 
*Dwork, Cynthia. 2006. Differential Privacy, 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer Verlag, 4052, 1-12, .
 
*Dwork, Cynthia. 2006. Differential Privacy, 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer Verlag, 4052, 1-12, .
* Dwork, Cynthia and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science. Vol. 9, Nos. 3–4. 211–407, .
+
*Dwork, Cynthia and Aaron Roth. 2014. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science. Vol. 9, Nos. 3–4. 211–407, .
 
*Machanavajjhala, Ashwin, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory Meets Practice on the Map, International Conference on Data Engineering (ICDE) 2008: 277-286, .
 
*Machanavajjhala, Ashwin, Daniel Kifer, John M. Abowd, Johannes Gehrke, and Lars Vilhuber. 2008. Privacy: Theory Meets Practice on the Map, International Conference on Data Engineering (ICDE) 2008: 277-286, .
 
*Dwork, Cynthia and Moni Naor. 2010. On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy, Journal of Privacy and Confidentiality: Vol. 2: Iss. 1, Article 8. Available at: http://repository.cmu.edu/jpc/vol2/iss1/8.
 
*Dwork, Cynthia and Moni Naor. 2010. On the Difficulties of Disclosure Prevention in Statistical Databases or The Case for Differential Privacy, Journal of Privacy and Confidentiality: Vol. 2: Iss. 1, Article 8. Available at: http://repository.cmu.edu/jpc/vol2/iss1/8.
*Kifer, Daniel and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 193-204. .
+
* Kifer, Daniel and Ashwin Machanavajjhala. 2011. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD '11). ACM, New York, NY, USA, 193-204. .
 
*Erlingsson, Úlfar, Vasyl Pihur and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1054-1067. .
 
*Erlingsson, Úlfar, Vasyl Pihur and Aleksandra Korolova. 2014. RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS '14). ACM, New York, NY, USA, 1054-1067. .
 
*Abowd, John M. and Ian M. Schmutte. 2017 . Revisiting the economics of privacy: Population statistics and confidentiality protection as public goods. Labor Dynamics Institute, Cornell University, Labor Dynamics Institute, Cornell University, at https://digitalcommons.ilr.cornell.edu/ldi/37/
 
*Abowd, John M. and Ian M. Schmutte. 2017 . Revisiting the economics of privacy: Population statistics and confidentiality protection as public goods. Labor Dynamics Institute, Cornell University, Labor Dynamics Institute, Cornell University, at https://digitalcommons.ilr.cornell.edu/ldi/37/
第568行: 第551行:     
差分隐私上的阅读清单。2017.“当所有数据都是私人数据时,统计机构将如何运作?”。隐私与保密期刊7(3)。(幻灯片)  
 
差分隐私上的阅读清单。2017.“当所有数据都是私人数据时,统计机构将如何运作?”。隐私与保密期刊7(3)。(幻灯片)  
* “差分隐私: 非技术观众入门”,Kobbi Nissim,Thomas Steinke,Alexandra Wood,Micah Altman,Aaron Bembenek,Mark Bun,Marco gabordi,David r. o’brien,and Salil Vadhan,Harvard Privacy Tools Project,February 14,2018
+
*“差分隐私: 非技术观众入门”,Kobbi Nissim,Thomas Steinke,Alexandra Wood,Micah Altman,Aaron Bembenek,Mark Bun,Marco gabordi,David r. o’brien,and Salil Vadhan,Harvard Privacy Tools Project,February 14,2018
 
*Dinur,Irit and Kobbi Nissim。2003.在保护隐私的同时披露信息。在第二十二届 ACM SIGMOD-SIGACT-SIGART 数据库系统原理研讨会会议录(PODS’03)。ACM,纽约,纽约,美国,202-210. 。
 
*Dinur,Irit and Kobbi Nissim。2003.在保护隐私的同时披露信息。在第二十二届 ACM SIGMOD-SIGACT-SIGART 数据库系统原理研讨会会议录(PODS’03)。ACM,纽约,纽约,美国,202-210. 。
 
*Dwork、 Cynthia、 Frank McSherry、 Kobbi Nissim 和 Adam Smith。2006. in Halevi,s & Rabin,t.(Eds.)在密码学的私人数据分析理论中校准噪声的灵敏度: 第三次密码学理论会议,TCC 2006,纽约,纽约,美国,2006年3月4-7。美国国家科学院院刊,Springer Berlin Heidelberg,265-284,。
 
*Dwork、 Cynthia、 Frank McSherry、 Kobbi Nissim 和 Adam Smith。2006. in Halevi,s & Rabin,t.(Eds.)在密码学的私人数据分析理论中校准噪声的灵敏度: 第三次密码学理论会议,TCC 2006,纽约,纽约,美国,2006年3月4-7。美国国家科学院院刊,Springer Berlin Heidelberg,265-284,。
 
*辛西娅。2006.差分隐私,第33届国际自动机,语言和编程学术讨论会,第二部分(ICALP 2006) ,Springer Verlag,4052,1-12,。
 
*辛西娅。2006.差分隐私,第33届国际自动机,语言和编程学术讨论会,第二部分(ICALP 2006) ,Springer Verlag,4052,1-12,。
*Dwork,Cynthia and Aaron Roth.2014.差分隐私的算法基础。理论计算机科学的基础与发展趋势。第一卷。9,Nos.3–4.211–407, .
+
* Dwork,Cynthia and Aaron Roth.2014.差分隐私的算法基础。理论计算机科学的基础与发展趋势。第一卷。9,Nos.3–4.211–407, .
 
*Machanavajjhala,Ashwin,Daniel Kifer,John m. Abowd,Johannes Gehrke,and Lars Vilhuber.2008.隐私权: 理论与实践的结合,国际数据工程会议2008:277-286,。
 
*Machanavajjhala,Ashwin,Daniel Kifer,John m. Abowd,Johannes Gehrke,and Lars Vilhuber.2008.隐私权: 理论与实践的结合,国际数据工程会议2008:277-286,。
 
*Dwork、 Cynthia 和 Moni Naor。2010.关于统计数据库中的披露防范的困难或者差分隐私的案例,隐私和保密期刊: 第一卷。2: Iss.1,第8条。网址:  http://repository.cmu.edu/jpc/vol2/iss1/8。
 
*Dwork、 Cynthia 和 Moni Naor。2010.关于统计数据库中的披露防范的困难或者差分隐私的案例,隐私和保密期刊: 第一卷。2: Iss.1,第8条。网址:  http://repository.cmu.edu/jpc/vol2/iss1/8。
第607行: 第590行:  
*私人地图制作者 v0.2 on the Common Data Project Blog
 
*私人地图制作者 v0.2 on the Common Data Project Blog
 
*Learning Statistics with Privacy,added by the Flip of a Coin by úlfar Erlingsson,Google Research Blog,October 2014
 
*Learning Statistics with Privacy,added by the Flip of a Coin by úlfar Erlingsson,Google Research Blog,October 2014
*Technology Factsheet: 差分隐私地图制作者 Raina Gandhi and Amritha Jayanti,Belfer Center for Science and International Affairs,Fall 2020
+
* Technology Factsheet: 差分隐私地图制作者 Raina Gandhi and Amritha Jayanti,Belfer Center for Science and International Affairs,Fall 2020
    
[[Category:Theory of cryptography]]
 
[[Category:Theory of cryptography]]
23

个编辑