基于Java的开源HTML解析器:jsoup 1.7.3 发布
jopen 11年前
Jsoup是一个Java的HTML解析器,提供了非常方便的抽取和操作HTML文档方法,可以结合DOM,CSS和Jquery类似的方法来定位和得到节点的信息。
有着和Jquery一样强大的select和pipeline的API。
jsoup 1.7.3 版本发布了,这个版本引入了改进的表单处理,更强大的字符集检测,在解析和CSS选择器方面速度和内存得到了优化,以及一些错误修正。
详细改进内容如下:
Improvements:
- Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit().
- Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.
- Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL.
- Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.
- Improved the time to match elements with common CSS selectors by ~ 27%.
Bug Fixes:
- Fixed support for self-closing script tags.
- Fixed a crash when reading an unterminated CDATA section.
- Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.
- Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.
- Fixed an issue when connecting or redirecting to a URL that contains a space.
</div> 本站翻译的中文版cookbook:http://www.open-open.com/Jsoup/
Document doc = Jsoup.connect("http://en.wikipedia.org/").get(); Elements newsHeadlines = doc.select("#mp-itn b a");
相关资讯
- 基于Java的开源HTML解析器:jsoup 1.7.3 发布
- Java开源类库HTML 解析器,jsoup 1.8.3 发布
- Java开源的HTML 解析器,jsoup 1.6.3 发布
- Java开源的HTML 解析器,jsoup 1.8.2 发布
- 强大的Java HTML 解析器,jsoup 1.6.2 发布
- 强大的Java 的HTML 解析器,jsoup 1.7.1 发布
- HTML解析器,Jericho HTML Parser 3.3 发布
- Atom 1.7.3 和 1.8.0-beta3 发布
- Android BaseRecyclerViewAdapterHelper v1.7.3 发布
- jsoup v1.10.2 发布