Html文档解析器 HtmlCleaner
jopen
13年前
<a href="/misc/goto?guid=4959499543027097061"> <img border="0" alt="Html文档解析器 HtmlCleaner" src="https://simg.open-open.com/show/8f7be556bdf74662b8e12a915d9deeb6.jpg" width="198" height="53" /> </a> <br /> HtmlCleaner是一个开源的Java语言的Html文档解析器。HtmlCleaner能够重新整理HTML文档的每个元素并生成结构良好 (Well-Formed)的 HTML 文档。默认它遵循的规则是类似于大部份web浏览器为创文档对象模型所使用的规则。然而,用户可以提供自定义tag和规则组来进行过滤和匹配。 <br /> <h3>功能特性:</h3> <ul> <li>HtmlCleaner parses input HTML and generates tree-structure suitable for programmatic manipulation.</li> <li>Serializers are responsible for outputting the DOM structure to XML, HTML, DOM or JDom.</li> <li>Parsing phase relies on tag descriptions which can be customized by the user.</li> <li>HtmlClaner's behaviour can be configured through number of parameters.</li> <li>HtmlClaner is thread safe, meaning that single instance can clean multiple html sources at the same time.</li> <li>HtmlClaner can be used from Java code, from command line or as Ant task.</li> <li>HtmlClaner requires JRE 1.5+.</li> </ul> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324371733999" target="_blank">http://www.open-open.com/lib/view/home/1324371733999</a></p>