HTML文档解析器 HTMLParser

jopen 13年前
     <p><img style="width:95px;height:70px;" title="htmlparserlogo.jpg" border="0" alt="htmlparserlogo.jpg" src="https://simg.open-open.com/show/e0b5404848e225b79cdc40bd85420ec0.jpg" width="157" height="132" /><br /> HTML Parser 是一个对HTML进行分析的快速实时的解析器,最新的发行版本是1.6,另外2.0的开发版本已经两年没有进展了。利用它实现:</p>    <ul>     <li>URL rewriting, modifying some or all links on a page</li>     <li>site capture, moving content from the web to local disk</li>     <li>censorship, removing offending words and phrases from pages</li>     <li>HTML cleanup, correcting erroneous pages</li>     <li>ad removal, excising URLs referencing advertising</li>     <li>conversion to XML, moving existing web pages to XML</li>    </ul>    <p>示例代码:<br /> </p>    <pre class="brush:java; toolbar: true; auto-links: false;"> Parser parser = new Parser ("http://whatever");  NodeList list = parser.parse (null);  Node node = list.elementAt (0);  NodeList sublist = node.getChildren ();  System.out.println (sublist.size ());</pre>    <br />    <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324372600983" target="_blank">http://www.open-open.com/lib/view/home/1324372600983</a></p>    <p></p>