更改

信息抽取 (查看源代码)

删除24字节、 2021年8月18日 (三) 21:45

完善小标题翻译

第128行：第128行：

最近的一个发展是基于视觉信息的信息抽取<ref name=":3" /><ref name=":4" /> ，它依赖于在浏览器中渲染网页，并根据渲染网页中区域的接近程度创建规则。这有助于从复杂的网页中提取实体，这些网页可能表现出一种视觉模式，但在 HTML 源代码中缺乏一种可识别的模式。

−

==~~Approaches~~==

+

==方法==

The following standard approaches are now widely accepted:

* Hand-written regular expressions (or nested group of regular expressions)

第173行：第173行：

IE 还有许多其他方法，包括混合方法，它们结合了以前列出的一些标准方法。

−

==~~Free or open source software and services~~==

+

==开源资源与服务==

* [[General Architecture for Text Engineering]] (GATE) is bundled with a free Information Extraction system

* Apache [[OpenNLP]] is a Java machine learning toolkit for natural language processing

10

个编辑