基于Java的开源HTML解析器：jsoup 1.7.3 发布

jopen 11年前

Jsoup是一个Java的HTML解析器，提供了非常方便的抽取和操作HTML文档方法，可以结合DOM，CSS和Jquery类似的方法来定位和得到节点的信息。
有着和Jquery一样强大的select和pipeline的API。

jsoup 1.7.3 版本发布了，这个版本引入了改进的表单处理，更强大的字符集检测，在解析和CSS选择器方面速度和内存得到了优化，以及一些错误修正。

详细改进内容如下：

Improvements:

- Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit().

- Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.

- Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL.

- Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.

- Improved the time to match elements with common CSS selectors by ~ 27%.

Bug Fixes:

- Fixed support for self-closing script tags.

- Fixed a crash when reading an unterminated CDATA section.

- Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.

- Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.

- Fixed an issue when connecting or redirecting to a URL that contains a space.

Document doc = Jsoup.connect("http://en.wikipedia.org/").get();  Elements newsHeadlines = doc.select("#mp-itn b a");

</div> 本站翻译的中文版cookbook：http://www.open-open.com/Jsoup/

基于Java的开源HTML解析器：jsoup 1.7.3 发布

相关资讯