Python处理OpenXML的类库 openxmllib
fmms
13年前
<p><strong>openxmllib</strong> 为 Python 语言提供了用来处理 OpenXML 文档的类库,要求 lxml 的支持。</p> <p>Office Open XML格式使用Open Packaging Conventions,XML Paper Specification (XPS)也使用它。但是,这两种格式在许多重要的方面是不同的。XPS是一个页面内的,固定的文档格式,它是在Microsoft Windows Vista操作系统当中所引入的。而Office Open XML格式是面向Office Word 2007,Office Excel 2007,和Office PowerPoint 2007的完全可编辑的文件格式。虽然它们在XML和ZIP压缩的使用方面有很多相似的地方,但是它们在文件格式的设计和使用目的上还是有着很大的不同。<br /> </p> <pre class="brush:python; toolbar: true; auto-links: false;">These examples say all:: >>> import openxmllib >>> doc = openxmllib.openXmlDocument(path='office.docx') >>> # Raises a ValueError on not supported office files. >>> doc.mimeType 'application/vnd.openxmlformats-officedocument.wordprocessingml.document' >>> doc.coreProperties # Keys may depend on application {'title': u'blah...', u'creator': u'John Doe', ...} >>> doc.extendedProperties # Keys may depend on application {'Words': u'312', 'Application': u'Your favorite word processor', ...} >>> doc.customProperties # May return an empty mapping {'My property': u'My value', ...} >>> doc.allProperties # Merges core+extended+custom properties (see above) {...} >>> doc.indexableText(include_properties=False) u'all the words of that document body' >>> doc.indexableText(include_properties=True) u'all the words of that document body and all properties values' Standard ``mimetypes`` package extensions :: >>> import mimetypes >>> mimetypes.guess_type('somedoc.docx') ('application/vnd.openxmlformats-officedocument.wordprocessingml.document', None) >>> mimetypes.guess_type('somecalc.xlsx') ('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', None) >>> mimetypes.guess_type('someslides.pptx') ('application/vnd.openxmlformats-officedocument.presentationml.presentation', None) Document factory signatures:: >>> # We have the path for the office file >>> doc = openxmllib.openXmlDocument(path='office.docx') >>> # We have a file object for the office file >>> fh = open('office.docx', 'rb') >>> doc = openxmllib.openXmlDocument(file_='office.docx') >>> # We have the URL for the office file >>> doc = openxmllib.openXmlDocument(url='http://domain.tld/office.docx') >>> # Xe have the raw data of the office file >>> import mimetypes >>> docx_mimetype = mimetypes.guess_type('office.docx') >>> body = open('office.docx', 'rb').read() >>> doc = open(data=body, mime_type=docx_mimetype) Note that if you're not running a Python application, you may get the indexable text from a document with the `openxmlinfo.py` console utility. Just type::</pre> <br /> <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1326939723124" target="_blank">http://www.open-open.com/lib/view/home/1326939723124</a></p> <p></p>