Java解析HTML的类库 HTML4J
jopen
13年前
<div id="p_fullcontent" class="detail"> <p>HTML4J 是一个 Java 解析 HTML 的类库。示例代码:<br /> </p> <pre class="brush:java; toolbar: true; auto-links: false;"> Reader re = ... // Create the document HTMLDoc doc = new HTMLDoc(); // Load its content doc.load(re); // Get the HTML HTMLFragment html = doc.getHTML(); // Create a 'date' meta-tag HTMLTag tag = HTMLTag.parse("<meta name=\"date\" content=21/01/2001>"); // Insert it just before the title html.insertBefore(html.findTagByName("title"), tag); // Create a paragraph tag = HTMLTag.create("p"); // Insert '<p>Paragraph</p>' just before a tag with id="someid" html.insertBefore(html.getIdFinder("someid").getTag().getPosition(), tag.toString("Paragraph")); // Create an anchor to foo.html HTMLTag anchor = HTMLTag.parse("<a href=\"foo.html\">"); // We could also do a 'HTMLTag.create("a")' and then set the 'href' // attribute using getAttributes().setAttribute("href", "foo.html") // // Now we get a tag block with id="otherid" tag = html.getIdFinder("otherid").getTagBlock(); // Replace the tag that has id="otherid" by the same tag // embraced by the foo.html anchor html.replace(tag.getBlockPosition(), anchor.toString(tag)); // For example, if the 'otherid' tag was 'img src="something.jpg"', // then the result would be: // '<a href="foo.html"><img id="otherid" src="something.jpg"></a>' // tag = html.getTagByName("meta"); // We just got the first 'meta' tag found in the document, and now we // set its name attribute to 'last_update', and its value // (the 'content' attribute) to "20/01/2001" tag.getAttributes().setAttribute("name", "last_update"); tag.getAttributes().setAttribute("content", "20/01/2001"); // Commit the changes to the 'meta' tag to the document html.update(tag);</pre> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324394802218" target="_blank">http://www.open-open.com/lib/view/home/1324394802218</a></p> <p></p> </div>