HTML文档解析器 HTMLParser
jopen
13年前
<p><img style="width:95px;height:70px;" title="htmlparserlogo.jpg" border="0" alt="htmlparserlogo.jpg" src="https://simg.open-open.com/show/e0b5404848e225b79cdc40bd85420ec0.jpg" width="157" height="132" /><br /> HTML Parser 是一个对HTML进行分析的快速实时的解析器,最新的发行版本是1.6,另外2.0的开发版本已经两年没有进展了。利用它实现:</p> <ul> <li>URL rewriting, modifying some or all links on a page</li> <li>site capture, moving content from the web to local disk</li> <li>censorship, removing offending words and phrases from pages</li> <li>HTML cleanup, correcting erroneous pages</li> <li>ad removal, excising URLs referencing advertising</li> <li>conversion to XML, moving existing web pages to XML</li> </ul> <p>示例代码:<br /> </p> <pre class="brush:java; toolbar: true; auto-links: false;"> Parser parser = new Parser ("http://whatever"); NodeList list = parser.parse (null); Node node = list.elementAt (0); NodeList sublist = node.getChildren (); System.out.println (sublist.size ());</pre> <br /> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324372600983" target="_blank">http://www.open-open.com/lib/view/home/1324372600983</a></p> <p></p>