Java开源Web数据抽取工具: Web-Harvest
jopen
12年前
Web-Harvest是一个Java开源Web数据抽取工具。它能够收集指定的Web页面并从这些页面中提取有用的数据。Web-Harvest主要是运用了像XSLT,XQuery,正则表达式等这些技术来实现对text/xml的操作。
1. Welcome screen with quick links
2. Web-Harvest XML editing with auto-completion support (Ctrl + Space)
3. Defining initial variables that are pushed to the Web-Harvest context before execution starts
4. Settings dialog
5. Viewing execution result as XML and testing XPath expression agains it
6. Viewing download images while execution in progress
7. Checking attributes of HTTP execution
8. Debugging