Python处理OpenXML的类库 openxmllib

fmms 13年前
     <p><strong>openxmllib</strong> 为 Python 语言提供了用来处理 OpenXML 文档的类库,要求 lxml 的支持。</p>    <p>Office Open XML格式使用Open Packaging Conventions,XML Paper Specification (XPS)也使用它。但是,这两种格式在许多重要的方面是不同的。XPS是一个页面内的,固定的文档格式,它是在Microsoft Windows Vista操作系统当中所引入的。而Office Open XML格式是面向Office Word 2007,Office Excel 2007,和Office PowerPoint 2007的完全可编辑的文件格式。虽然它们在XML和ZIP压缩的使用方面有很多相似的地方,但是它们在文件格式的设计和使用目的上还是有着很大的不同。<br /> </p>    <pre class="brush:python; toolbar: true; auto-links: false;">These examples say all::    >>> import openxmllib   >>> doc = openxmllib.openXmlDocument(path='office.docx')   >>> # Raises a ValueError on not supported office files.   >>> doc.mimeType   'application/vnd.openxmlformats-officedocument.wordprocessingml.document'   >>> doc.coreProperties # Keys may depend on application   {'title': u'blah...', u'creator': u'John Doe', ...}   >>> doc.extendedProperties # Keys may depend on application   {'Words': u'312', 'Application': u'Your favorite word processor', ...}   >>> doc.customProperties # May return an empty mapping   {'My property': u'My value', ...}   >>> doc.allProperties # Merges core+extended+custom properties (see above)   {...}   >>> doc.indexableText(include_properties=False)   u'all the words of that document body'   >>> doc.indexableText(include_properties=True)   u'all the words of that document body and all properties values'  Standard ``mimetypes`` package extensions ::    >>> import mimetypes   >>> mimetypes.guess_type('somedoc.docx')   ('application/vnd.openxmlformats-officedocument.wordprocessingml.document', None)   >>> mimetypes.guess_type('somecalc.xlsx')   ('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', None)   >>> mimetypes.guess_type('someslides.pptx')   ('application/vnd.openxmlformats-officedocument.presentationml.presentation', None)  Document factory signatures::    >>> # We have the path for the office file   >>> doc = openxmllib.openXmlDocument(path='office.docx')   >>> # We have a file object for the office file   >>> fh = open('office.docx', 'rb')   >>> doc = openxmllib.openXmlDocument(file_='office.docx')   >>> # We have the URL for the office file   >>> doc = openxmllib.openXmlDocument(url='http://domain.tld/office.docx')   >>> # Xe have the raw data of the office file   >>> import mimetypes   >>> docx_mimetype = mimetypes.guess_type('office.docx')   >>> body = open('office.docx', 'rb').read()   >>> doc = open(data=body, mime_type=docx_mimetype)  Note that if you're not running a Python application, you may get the indexable text from a document with the `openxmlinfo.py` console utility. Just type::</pre>    <br />    <p><strong>项目主页:</strong><a href="http://www.open-open.com/lib/view/home/1326939723124" target="_blank">http://www.open-open.com/lib/view/home/1326939723124</a></p>    <p></p>