HTML解析库 html5lib
jopen
13年前
html5lib 是一个 Ruby 和 Python 用来解析 HTML 文档的类库,支持HTML 5 以及最大程度兼容桌面浏览器。 <p>主要特性包括:</p> <ul> <li><a id="0.11_Release_Features">将有效和无效的HTML文档解析成树</a></li> <li><a id="0.11_Release_Features">Support for <strong>minidom</strong>, <strong>ElementTree</strong> (including <strong>cElementTree</strong> and <strong>lxml.etree</strong>), <strong>BeautifulSoup</strong> and custom <strong>simpletree</strong> output formats </a></li> <li><a id="0.11_Release_Features"><strong>DOM</strong> 到 <strong>SAX</strong> 转换器 </a></li> <li><a id="0.11_Release_Features">Reports parse errors </a></li> <li><a id="0.11_Release_Features">字符集探测 </a></li> <li><a id="0.11_Release_Features">XML mode for working with illformed XML e.g. feeds </a></li> <li><a id="0.11_Release_Features">Filtering and serializing of trees </a></li> <li><a id="0.11_Release_Features">HTML+CSS sanitizer </a></li> <li><a id="0.11_Release_Features">非常多的单元测试 </a></li> <li><a id="0.11_Release_Features">比之前快 <br /> </a></li> </ul> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1324371284280" target="_blank">http://www.open-open.com/lib/view/home/1324371284280</a></p>