更改

信息抽取 (查看源代码)

2021年8月28日 (六) 19:30的版本

删除14字节、 2021年8月28日 (六) 19:30

→‎任务与子任务

第60行：第60行：

*** 人（PERSON）所处位置（LOCATION）（摘自“Bill在法国”这句话。）

* 半结构化信息抽取，它是试图恢复某种信息结构的信息抽取方法的统称，这种信息结构在发布过程中已经丢失，例如:

−

** 表提取: 从文档中查找和提取表<ref name=":9">{{cite journal | vauthors = Milosevic N, Gregson C, Hernandez R, Nenadic G | title = A framework for information extraction from tables in biomedical literature | journal = International Journal on Document Analysis and Recognition (IJDAR) | volume = 22 | issue = 1 | pages = 55–78 | date = February 2019 | doi = 10.1007/s10032-019-00317-0 | arxiv = 1902.10031 | bibcode = 2019arXiv190210031M ~~| s2cid = 62880746~~ }}</ref><ref name=":10">{{cite thesis |type=PhD |last=Milosevic |first=Nikola |date=2018 |title=A multi-layered approach to information extraction from tables in biomedical documents |publisher=University of Manchester | url=https://www.research.manchester.ac.uk/portal/files/70405100/FULL_TEXT.PDF}}</ref>。

+

** 表提取: 从文档中查找和提取表<ref name=":9">{{cite journal | vauthors = Milosevic N, Gregson C, Hernandez R, Nenadic G | title = A framework for information extraction from tables in biomedical literature | journal = International Journal on Document Analysis and Recognition (IJDAR) | volume = 22 | issue = 1 | pages = 55–78 | date = February 2019 | doi = 10.1007/s10032-019-00317-0 | arxiv = 1902.10031 | bibcode = 2019arXiv190210031M }}</ref><ref name=":10">{{cite thesis |type=PhD |last=Milosevic |first=Nikola |date=2018 |title=A multi-layered approach to information extraction from tables in biomedical documents |publisher=University of Manchester | url=https://www.research.manchester.ac.uk/portal/files/70405100/FULL_TEXT.PDF}}</ref>。

** 表信息抽取: 以结构化方式从表中提取信息。这比表格提取更复杂，因为表格提取只是第一步，而理解单元格、行、列的角色、表格内信息的链接和理解表格中的信息是表格/信息抽取所必需的额外任务。<ref>{{cite journal | vauthors = Milosevic N, Gregson C, Hernandez R, Nenadic G | title = A framework for information extraction from tables in biomedical literature | journal = International Journal on Document Analysis and Recognition (IJDAR) | volume = 22 | issue = 1 | pages = 55–78 | date = February 2019 | doi = 10.1007/s10032-019-00317-0 | arxiv = 1902.10031 | bibcode = 2019arXiv190210031M | s2cid = 62880746 }}</ref><ref>{{cite journal | vauthors = Milosevic N, Gregson C, Hernandez R, Nenadic G | title = Disentangling the structure of tables in scientific literature | journal = 21st International Conference on Applications of Natural Language to Information Systems | series = Lecture Notes in Computer Science | volume = 21 | date = June 2016 | pages = 162–174 | doi = 10.1007/978-3-319-41754-7_14 | isbn = 978-3-319-41753-0 | url = https://www.research.manchester.ac.uk/portal/en/publications/disentangling-the-structure-of-tables-in-scientific-literature(473111c2-52e9-493a-be8c-1a78c5b7ce36).html }}</ref><ref>{{cite thesis |type=PhD |last=Milosevic |first=Nikola |date=2018 |title=A multi-layered approach to information extraction from tables in biomedical documents |publisher=University of Manchester | url=https://www.research.manchester.ac.uk/portal/files/70405100/FULL_TEXT.PDF}}</ref>

** 注释提取: 从文章的实际内容中提取注释，以恢复每个句子的作者之间的联系

第74行：第74行：

非文本文档的信息抽取正成为一个越来越引人注目的研究课题，从多媒体文档中提取的信息现在可以像在文本中一样以高层次的结构表达。这自然导致了从多种文档和资源中提取的信息的融合。

+

<br>

==基于万维网的应用==

薄荷

7,129

个编辑