HTML 解析器 Jericho
jopen
13年前
<p>Jericho HTML解析器是一个Java库,以分析和操纵部分的HTML文件,其中包括服务器端的标签,而过滤掉任何无法识别的或无效的HTML 。它也提供高层次的HTML表单操作函数。</p> <p>示例代码:</p> <pre class="brush:java; toolbar: true; auto-links: false;">import net.htmlparser.jericho.*; import java.util.*; import java.io.*; import java.net.*; public class Encoding { public static void main(String[] args) throws Exception { String sourceUrlString="data/test.html"; if (args.length==0) System.err.println("Using default argument of \""+sourceUrlString+'"'); else sourceUrlString=args[0]; if (sourceUrlString.indexOf(':')==-1) sourceUrlString="file:"+sourceUrlString; System.out.println("\nSource URL:"); System.out.println(sourceUrlString); URL url=new URL(sourceUrlString); Source source=new Source(url); System.out.println("\nDocument Title:"); Element titleElement=source.getFirstElement(HTMLElementName.TITLE); System.out.println(titleElement!=null ? titleElement.getContent().toString() : "(none)"); System.out.println("\nSource.getEncoding():"); System.out.println(source.getEncoding()); System.out.println("\nSource.getEncodingSpecificationInfo():"); System.out.println(source.getEncodingSpecificationInfo()); System.out.println("\nSource.getPreliminaryEncodingInfo():"); System.out.println(source.getPreliminaryEncodingInfo()); } }</pre> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324433058296" target="_blank">http://www.open-open.com/lib/view/home/1324433058296</a></p> <p></p>