− | 内容分析(content analysis)一直以来都是社会科学和媒体研究的传统组成部分。内容分析的自动化通过研究社交媒体和报刊杂志上数百万计的新闻内容,使得“大数据革命”惠及社会科学。性别歧视、可读性、内容相似度、读者偏好、甚至情绪等都文本挖掘方法在数百万文档里研究过了。 | + | 内容分析(content analysis)一直以来都是社会科学和媒体研究的传统组成部分。内容分析的自动化通过研究社交媒体和报刊杂志上数百万计的新闻内容,使得“大数据革命”惠及社会科学。性别偏向、可读性、内容相似度、读者偏好、甚至情绪等都文本挖掘方法在数百万文档里研究过了。<ref>{{cite journal|author1=I. Flaounas|author2=M. Turchi|author3=O. Ali|author4=N. Fyson|author5=T. De Bie|author6=N. Mosdell|author7=J. Lewis|author8=N. Cristianini|title=The Structure of EU Mediasphere|journal=PLOS One|volume=5|issue=12|pages=e14243|year=2010|doi=10.1371/journal.pone.0014243|url=https://orca-mwe.cf.ac.uk/50732/1/Flaounas%202010.pdf|pmid=21170383|pmc=2999531|bibcode=2010PLoSO...514243F}}</ref><ref>{{cite journal|title=Nowcasting Events from the Social Web with Statistical Learning|author1=V Lampos|author2=N Cristianini|journal=ACM Transactions on Intelligent Systems and Technology |volume=3|issue=4|page=72|doi=10.1145/2337542.2337557|year=2012|url=http://www.lampos.net/sites/default/files/papers/lampos2012nowcasting.pdf}}</ref><ref>{{cite conference|title=NOAM: news outlets analysis and monitoring system|author1=I. Flaounas|author2=O. Ali|author3=M. Turchi|author4=T Snowsill|author5=F Nicart|author6=T De Bie|author7=N Cristianini|conference=Proc. of the 2011 ACM SIGMOD international conference on Management of data|year=2011|url=http://www.tijldebie.net/system/files/SIGMOD_11_demo_Ilias.pdf|doi=10.1145/1989323.1989474}}</ref><ref>{{cite book|author=N Cristianini|title=''Combinatorial Pattern Matching''|pages=2–13|year=2011|volume=6661|series= Lecture Notes in Computer Science|isbn=978-3-642-21457-8|doi=10.1007/978-3-642-21458-5_2|chapter=Automatic Discovery of Patterns in Media Content|citeseerx=10.1.1.653.9525}}</ref><ref>{{Cite journal|last=Lansdall-Welfare|first=Thomas|last2=Sudhahar|first2=Saatviga|last3=Thompson|first3=James|last4=Lewis|first4=Justin|last5=Team|first5=FindMyPast Newspaper|last6=Cristianini|first6=Nello|date=2017-01-09|title=Content analysis of 150 years of British periodicals|url=http://www.pnas.org/content/early/2017/01/03/1606380114|journal=Proceedings of the National Academy of Sciences|volume=114|issue=4|language=en|pages=E457–E465|doi=10.1073/pnas.1606380114|issn=0027-8424|pmid=28069962|pmc=5278459}}</ref> |
− | [[Content analysis]] has been a traditional part of social sciences and media studies for a long time. The automation of content analysis has allowed a "大数据" 革命 to take place in that field, with studies in social media and newspaper content that include millions of news items. [[Gender bias]], [[readability]], content similarity, reader preferences, and even mood have been analyzed based on [[text mining]] methods over millions of documents.<ref>{{cite journal|author1=I. Flaounas|author2=M. Turchi|author3=O. Ali|author4=N. Fyson|author5=T. De Bie|author6=N. Mosdell|author7=J. Lewis|author8=N. Cristianini|title=The Structure of EU Mediasphere|journal=PLOS One|volume=5|issue=12|pages=e14243|year=2010|doi=10.1371/journal.pone.0014243|url=https://orca-mwe.cf.ac.uk/50732/1/Flaounas%202010.pdf|pmid=21170383|pmc=2999531|bibcode=2010PLoSO...514243F}}</ref><ref>{{cite journal|title=Nowcasting Events from the Social Web with Statistical Learning|author1=V Lampos|author2=N Cristianini|journal=ACM Transactions on Intelligent Systems and Technology |volume=3|issue=4|page=72|doi=10.1145/2337542.2337557|year=2012|url=http://www.lampos.net/sites/default/files/papers/lampos2012nowcasting.pdf}}</ref><ref>{{cite conference|title=NOAM: news outlets analysis and monitoring system|author1=I. Flaounas|author2=O. Ali|author3=M. Turchi|author4=T Snowsill|author5=F Nicart|author6=T De Bie|author7=N Cristianini|conference=Proc. of the 2011 ACM SIGMOD international conference on Management of data|year=2011|url=http://www.tijldebie.net/system/files/SIGMOD_11_demo_Ilias.pdf|doi=10.1145/1989323.1989474}}</ref><ref>{{cite book|author=N Cristianini|title=''Combinatorial Pattern Matching''|pages=2–13|year=2011|volume=6661|series= Lecture Notes in Computer Science|isbn=978-3-642-21457-8|doi=10.1007/978-3-642-21458-5_2|chapter=Automatic Discovery of Patterns in Media Content|citeseerx=10.1.1.653.9525}}</ref><ref>{{Cite journal|last=Lansdall-Welfare|first=Thomas|last2=Sudhahar|first2=Saatviga|last3=Thompson|first3=James|last4=Lewis|first4=Justin|last5=Team|first5=FindMyPast Newspaper|last6=Cristianini|first6=Nello|date=2017-01-09|title=Content analysis of 150 years of British periodicals|url=http://www.pnas.org/content/early/2017/01/03/1606380114|journal=Proceedings of the National Academy of Sciences|volume=114|issue=4|language=en|pages=E457–E465|doi=10.1073/pnas.1606380114|issn=0027-8424|pmid=28069962|pmc=5278459}}</ref> The analysis of readability, gender bias and topic bias was demonstrated in Flaounas et al.<ref>{{cite journal|author1=I. Flaounas|author2=O. Ali|author3=M. Turchi|author4=T. Lansdall-Welfare|author5=T. De Bie|author6=N. Mosdell|author7=J. Lewis|author8=N. Cristianini|title=Research methods in the age of digital journalism|journal=Digital Journalism|year=2012|doi=10.1080/21670811.2012.714928|volume=1|pages=102–116}}</ref> showing how different topics have different gender biases and levels of readability; the possibility to detect mood shifts in a vast population by analysing Twitter content was demonstrated as well.<ref>{{cite conference|title=Effects of the Recession on Public Mood in the UK|author=T Lansdall-Welfare|author2=V Lampos|author3=N Cristianini|series=Mining Social Network Dynamics (MSND) session on Social Media Applications|doi=10.1145/2187980.2188264|conference=Proceedings of the 21st International Conference on World Wide Web|pages=1221–1226|location=New York, NY, USA|url=http://www.cs.bris.ac.uk/Publications/Papers/2001521.pdf}}</ref>
| + | Flaounas et al.<ref>{{cite journal|author1=I. Flaounas|author2=O. Ali|author3=M. Turchi|author4=T. Lansdall-Welfare|author5=T. De Bie|author6=N. Mosdell|author7=J. Lewis|author8=N. Cristianini|title=Research methods in the age of digital journalism|journal=Digital Journalism|year=2012|doi=10.1080/21670811.2012.714928|volume=1|pages=102–116}}</ref>这篇论文中对于可读性、性别偏向和主题偏向等进行了分析。论文展示了不同的主题有不同的性别偏向和可读性,还探讨了通过分析Twitter内容来识别人群的情绪变化的可能性。<ref>{{cite conference|title=Effects of the Recession on Public Mood in the UK|author=T Lansdall-Welfare|author2=V Lampos|author3=N Cristianini|series=Mining Social Network Dynamics (MSND) session on Social Media Applications|doi=10.1145/2187980.2188264|conference=Proceedings of the 21st International Conference on World Wide Web|pages=1221–1226|location=New York, NY, USA|url=http://www.cs.bris.ac.uk/Publications/Papers/2001521.pdf}}</ref> |