更改

信息抽取 (查看源代码)

2021年8月28日 (六) 19:33的版本

添加2字节、 2021年8月28日 (六) 19:33

第84行：第84行： −

最近的一个发展是基于视觉信息的信息抽取<ref name=":3">{{cite arXiv|eprint = 1506.08454|title=WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction|first1=Vijil |last1=Chenthamarakshan|first2=Prasad M |last2=Desphande |first3= Raghu |last3=Krishnapuram |first4= Ramakrishnan |last4=Varadarajan |first5= Knut |last5=Stolze|year=2015|class=cs.CL}}</ref><ref name=":4">{{cite ~~document~~|citeseerx = 10.1.1.21.8236|title=Visual Web Information Extraction with Lixto|first1=Robert |last1=Baumgartner|first2=Sergio |last2=Flesca |first3= Georg |last3=Gottlob|year=2001|pages=119–128}}</ref> ，它依赖于在浏览器中渲染网页，并根据渲染网页中区域的接近程度创建规则。这有助于从复杂的网页中提取实体，这些网页可能表现出一种视觉模式，但在 HTML 源代码中缺乏一种可识别的模式。

+

最近的一个发展是基于视觉信息的信息抽取<ref name=":3">{{cite arXiv|eprint = 1506.08454|title=WYSIWYE: An Algebra for Expressing Spatial and Textual Rules for Information Extraction|first1=Vijil |last1=Chenthamarakshan|first2=Prasad M |last2=Desphande |first3= Raghu |last3=Krishnapuram |first4= Ramakrishnan |last4=Varadarajan |first5= Knut |last5=Stolze|year=2015|class=cs.CL}}</ref><ref name=":4">{{cite paper|citeseerx = 10.1.1.21.8236|title=Visual Web Information Extraction with Lixto|first1=Robert |last1=Baumgartner|first2=Sergio |last2=Flesca |first3= Georg |last3=Gottlob|year=2001|pages=119–128}}</ref> ，它依赖于在浏览器中渲染网页，并根据渲染网页中区域的接近程度创建规则。这有助于从复杂的网页中提取实体，这些网页可能表现出一种视觉模式，但在 HTML 源代码中缺乏一种可识别的模式。

+

<br>

==方法==

薄荷

7,129

个编辑

更改

信息抽取 (查看源代码)

2021年8月28日 (六) 19:33的版本

导航菜单

搜索