Java 中常用的几种 DOCX 转 PDF 方法

damoo 8年前
   <h2>DOCX2PDF</h2>    <p>将DOCX文档转化为PDF是项目中常见的需求之一,目前主流的方法可以分为两大类,一类是利用各种Office应用进行转换,譬如Microsoft Office、WPS以及LiberOffice,另一种是利用各种语言提供的对于Office文档读取的接口(譬如Apache POI)然后使用专门的PDFGenerator库,譬如IText进行PDF构建。总的来说,从样式上利用Office应用可以保证较好的样式,不过相对而言效率会比较低。其中Microsoft Office涉及版权,不可轻易使用(笔者所在公司就被抓包了),WPS目前使用比较广泛,不过存在超链接截断问题,即超过256个字符的超链接会被截断,LiberOffice的样式排版相对比较随意。而利用POI接口进行读取与生成的方式性能较好,适用于对于格式要求不是很高的情况。另外还有一些封装好的在线工具或者命令行工具,譬如 <a href="/misc/goto?guid=4959713644231208911" rel="nofollow,noindex">docx2pdf</a> 与 <a href="/misc/goto?guid=4959713644330119802" rel="nofollow,noindex">OfficeToPDF</a> 。</p>    <h2>MicroSoft Office</h2>    <p>本部分的核心代码如下,全部代码参考 <a href="/misc/goto?guid=4959713644400739608" rel="nofollow,noindex">这里</a> :</p>    <pre>  <code class="language-java">private ActiveXComponent oleComponent = null;  private Dispatch activeDoc = null;  private final static String APP_ID = "Word.Application";    // Constants that map onto Word's WdSaveOptions enumeration and that  // may be passed to the close(int) method  public static final int DO_NOT_SAVE_CHANGES = 0;  public static final int PROMPT_TO_SAVE_CHANGES = -2;  public static final int SAVE_CHANGES = -1;    // These constant values determine whether or not tha application  // instance will be displyed on the users screen or not.  public static final boolean VISIBLE = true;  public static final boolean HIDDEN = false;    /**   * Create a new instance of the JacobWordSearch class using the following   * parameters.   *   * @param visibility A primitive boolean whose value will determine whether   *                   or not the Word application will be visible to the user. Pass true   *                   to display Word, false otherwise.   */  public OfficeConverter(boolean visibility) {      this.oleComponent = new ActiveXComponent(OfficeConverter.APP_ID);      this.oleComponent.setProperty("Visible", new Variant(visibility));  }    /**   * Open ana existing Word document.   *   * @param docName An instance of the String class that encapsulates the   *                path to and name of a valid Word file. Note that there are a few   *                limitations applying to the format of this String; it must specify   *                the absolute path to the file and it must not use the single forward   *                slash to specify the path separator.   */  public void openDoc(String docName) {      Dispatch disp = null;      Variant var = null;      // First get a Dispatch object referencing the Documents collection - for      // collections, think of ArrayLists of objects.      var = Dispatch.get(this.oleComponent, "Documents");      disp = var.getDispatch();      // Now call the Open method on the Documents collection Dispatch object      // to both open the file and add it to the collection. It would be possible      // to open a series of files and access each from the Documents collection      // but for this example, it is simpler to store a reference to the      // active document in a private instance variable.      var = Dispatch.call(disp, "Open", docName);      this.activeDoc = var.getDispatch();  }    /**   * There is more than one way to convert the document into PDF format, you   * can either explicitly use a FileConvertor object or call the   * ExportAsFixedFormat method on the active document. This method opts for   * the latter and calls the ExportAsFixedFormat method passing the name   * of the file along with the integer value of 17. This value maps onto one   * of Word's constants called wdExportFormatPDF and causes the application   * to convert the file into PDF format. If you wanted to do so, for testing   * purposes, you could add another value to the args array, a Boolean value   * of true. This would open the newly converted document automatically.   *   * @param filename   */  public void publishAsPDF(String filename) {      // The code to expoort as a PDF is 17      //Object args = new Object{filename, new Integer(17), new Boolean(true)};      Object args = new Object {          filename, new Integer(17)      } ;      Dispatch.call(this.activeDoc, "ExportAsFixedFormat", args);  }    /**   * Called to close the active document. Note that this method simply   * calls the overloaded closeDoc(int) method passing the value 0 which   * instructs Word to close the document and discard any changes that may   * have been made since the document was opened or edited.   */  public void closeDoc() {      this.closeDoc(JacobWordSearch.DO_NOT_SAVE_CHANGES);  }    /**   * Called to close the active document. It is possible with this overloaded   * version of the close() method to specify what should happen if the user   * has made changes to the document that have not been saved. There are three   * possible value defined by the following manifest constants;   * DO_NOT_SAVE_CHANGES - Close the document and discard any changes   * the user may have made.   * PROMPT_TO_SAVE_CHANGES - Display a prompt to the user asking them   * how to proceed.   * SAVE_CHANGES - Save the changes the user has made to the document.   *   * @param saveOption A primitive integer whose value indicates how the close   *                   operation should proceed if the user has made changes to the active   *                   document. Note that no checks are made on the value passed to   *                   this argument.   */  public void closeDoc(int saveOption) {      Object args = {new Integer(saveOption)};      Dispatch.call(this.activeDoc, "Close", args);  }    /**   * Called once processing has completed in order to close down the instance   * of Word.   */  public void quit() {      Dispatch.call(this.oleComponent, "Quit");  }</code></pre>    <h2>WPS</h2>    <ul>     <li> <p><a href="/misc/goto?guid=4959713644489042772" rel="nofollow,noindex">Java调用WPS或pdfcreator的com接口实现doc转pdf</a></p> </li>    </ul>    <p>本文的核心代码如下,完整代码查看 <a href="/misc/goto?guid=4959713644569421744" rel="nofollow,noindex">这里</a> :</p>    <pre>  <code class="language-java">@Override          public boolean convert(String word, String pdf) {              File pdfFile = new File(pdf);              File wordFile = new File(word);              boolean convertSuccessfully = false;                ActiveXComponent wps = null;              ActiveXComponent doc = null;                  try {                  wps = new ActiveXComponent("KWPS.Application");    //                Dispatch docs = wps.getProperty("Documents").toDispatch();  //                Dispatch d = Dispatch.call(docs, "Open", wordFile.getAbsolutePath(), false, true).toDispatch();  //                Dispatch.call(d, "SaveAs", pdfFile.getAbsolutePath(), 17);  //                Dispatch.call(d, "Close", false);                    doc = wps.invokeGetComponent("Documents")                          .invokeGetComponent("Open", new Variant(wordFile.getAbsolutePath()));                    try {                      doc.invoke("SaveAs",                              new Variant(new File("C:\\Users\\lotuc\\Documents\\mmm.pdf").getAbsolutePath()),                              new Variant(17));                      convertSuccessfully = true;                  } catch (Exception e) {                      logger.warning("生成PDF失败");                      e.printStackTrace();                  }                    File saveAsFile = new File("C:\\Users\\lotuc\\Documents\\saveasfile.doc");                  try {                      doc.invoke("SaveAs", saveAsFile.getAbsolutePath());                      logger.info("成功另存为" + saveAsFile.getAbsolutePath());                  } catch (Exception e) {                      logger.info("另存为" + saveAsFile.getAbsolutePath() + "失败");                      e.printStackTrace();                  }              } finally {                  if (doc == null) {                      logger.info("打开文件 " + wordFile.getAbsolutePath() + " 失败");                  } else {                      try {                          logger.info("释放文件 " + wordFile.getAbsolutePath());                          doc.invoke("Close");                          doc.safeRelease();                      } catch (Exception e1) {                          logger.info("释放文件 " + wordFile.getAbsolutePath() + " 失败");                      }                  }                    if (wps == null) {                      logger.info("加载 WPS 控件失败");                  } else {                      try {                          logger.info("释放 WPS 控件");                          wps.invoke("Quit");                          wps.safeRelease();                      } catch (Exception e1) {                          logger.info("释放 WPS 控件失败");                      }                  }              }                return convertSuccessfully;          }</code></pre>    <h2>LiberOffice</h2>    <ul>     <li> <p><a href="/misc/goto?guid=4959713644650488110" rel="nofollow,noindex">Convert Microsoft Word to PDF - using Java and LibreOffice (UNO API)</a></p> </li>    </ul>    <p>LiberOffice本身提供了一个命令行工具进行转换,在你安装好了LiberOffice之后</p>    <pre>  <code class="language-java">/usr/local/bin/soffice --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc</code></pre>    <p>如果有打开的libreoffice实例, 要穿入env选项指定一个工作目录</p>    <pre>  <code class="language-java">/usr/local/bin/soffice "-env:UserInstallation=file:///tmp/LibreOffice_Conversion_abc" --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc</code></pre>    <p>首先我们需要安装好LiberOffice,然后将依赖的Jar包添加到classpath中:</p>    <pre>  <code class="language-java">Install Libre Office    Create a Java project in your favorite editor and add these to your class path:    [Libre Office Dir]/URE/java/juh.jar    [Libre Office Dir]/URE/java/jurt.jar    [Libre Office Dir]/URE/java/ridl.jar    [Libre Office Dir]/program/classes/unoil.jar</code></pre>    <p>然后我们需要启动一个LiberOffice进程:</p>    <pre>  <code class="language-java">import java.util.Date;  import java.io.File;  import com.sun.star.beans.PropertyValue;  import com.sun.star.comp.helper.Bootstrap;  import com.sun.star.frame.XComponentLoader;  import com.sun.star.frame.XDesktop;  import com.sun.star.frame.XStorable;  import com.sun.star.lang.XComponent;  import com.sun.star.lang.XMultiComponentFactory;  import com.sun.star.text.XTextDocument;  import com.sun.star.uno.UnoRuntime;  import com.sun.star.uno.XComponentContext;  import com.sun.star.util.XReplaceDescriptor;  import com.sun.star.util.XReplaceable;    public class MailMergeExample {    public static void main(String[] args) throws Exception {     // Initialise   XComponentContext xContext = Bootstrap.bootstrap();     XMultiComponentFactory xMCF = xContext.getServiceManager();      Object oDesktop = xMCF.createInstanceWithContext(        "com.sun.star.frame.Desktop", xContext);      XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(        XDesktop.class, oDesktop);</code></pre>    <p>接下来我们需要加载目标Doc文档:</p>    <pre>  <code class="language-java">// Load the Document  String workingDir = "C:/projects/";  String myTemplate = "letterTemplate.doc";    if (!new File(workingDir + myTemplate).canRead()) {   throw new RuntimeException("Cannot load template:" + new File(workingDir + myTemplate));  }    XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime   .queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);    String sUrl = "file:///" + workingDir + myTemplate;    PropertyValue[] propertyValues = new PropertyValue[0];    propertyValues = new PropertyValue[1];  propertyValues[0] = new PropertyValue();  propertyValues[0].Name = "Hidden";  propertyValues[0].Value = new Boolean(true);    XComponent xComp = xCompLoader.loadComponentFromURL(   sUrl, "_blank", 0, propertyValues);</code></pre>    <p><img src="https://simg.open-open.com/show/701fbe6cf65a06581a333bec0f18bb0d.png"> 然后我们可以使用如下方式对内容进行替换:</p>    <pre>  <code class="language-java">// Search and replace  XReplaceDescriptor xReplaceDescr = null;  XReplaceable xReplaceable = null;    XTextDocument xTextDocument = (XTextDocument) UnoRuntime    .queryInterface(XTextDocument.class, xComp);    xReplaceable = (XReplaceable) UnoRuntime    .queryInterface(XReplaceable.class, xTextDocument);    xReplaceDescr = (XReplaceDescriptor) xReplaceable    .createReplaceDescriptor();    // mail merge the date  xReplaceDescr.setSearchString("<date>");  xReplaceDescr.setReplaceString(new Date().toString());  xReplaceable.replaceAll(xReplaceDescr);    // mail merge the addressee  xReplaceDescr.setSearchString("<addressee>");  xReplaceDescr.setReplaceString("Best Friend");  xReplaceable.replaceAll(xReplaceDescr);    // mail merge the signatory  xReplaceDescr.setSearchString("<signatory>");  xReplaceDescr.setReplaceString("Your New Boss");  xReplaceable.replaceAll(xReplaceDescr);</code></pre>    <p>然后可以输出到PDF中:</p>    <pre>  <code class="language-java">// save as a PDF  XStorable xStorable = (XStorable) UnoRuntime    .queryInterface(XStorable.class, xComp);    propertyValues = new PropertyValue[2];  propertyValues[0] = new PropertyValue();  propertyValues[0].Name = "Overwrite";  propertyValues[0].Value = new Boolean(true);  propertyValues[1] = new PropertyValue();  propertyValues[1].Name = "FilterName";  propertyValues[1].Value = "writer_pdf_Export";    // Appending the favoured extension to the origin document name  String myResult = workingDir + "letterOutput.pdf";  xStorable.storeToURL("file:///" + myResult, propertyValues);    System.out.println("Saved " + myResult);</code></pre>    <p><img src="https://simg.open-open.com/show/4fe0238895e3dea877c710f5caa9b2b0.png"></p>    <h2><a href="/misc/goto?guid=4959713644737081287" rel="nofollow,noindex">xdocreport</a></h2>    <p>本文的核心代码如下,完整代码查看 <a href="/misc/goto?guid=4959713644822230586" rel="nofollow,noindex">这里</a> :</p>    <pre>  <code class="language-java">/**   * @param inpuFile 输入的文件流   * @param outFile  输出的文件对象   * @return   * @function 利用Apache POI从输入的文件中生成PDF文件   */  @SneakyThrows  public static void convertWithPOI(InputStream inpuFile, File outFile) {        //从输入的文件流创建对象      XWPFDocument document = new XWPFDocument(inpuFile);        //创建PDF选项      PdfOptions pdfOptions = PdfOptions.create();//.fontEncoding("windows-1250")        //为输出文件创建目录      outFile.getParentFile().mkdirs();        //执行PDF转化      PdfConverter.getInstance().convert(document, new FileOutputStream(outFile), pdfOptions);    }    /**   * @param inpuFile   * @param outFile   * @param renderParams   * @function 先将渲染参数填入模板DOCX文件然后生成PDF   */  @SneakyThrows  public static void convertFromTemplateWithFreemarker(InputStream inpuFile, File outFile, Map<String, Object> renderParams) {        //创建Report实例      IXDocReport report = XDocReportRegistry.getRegistry().loadReport(              inpuFile, TemplateEngineKind.Freemarker);        //创建上下文      IContext context = report.createContext();        //填入渲染参数      renderParams.forEach((s, o) -> {          context.put(s, o);      });        //创建输出流      outFile.getParentFile().mkdirs();        //创建转化参数      Options options = Options.getTo(ConverterTypeTo.PDF).via(              ConverterTypeVia.XWPF);        //执行转化过程      report.convert(context, options, new FileOutputStream(outFile));  }</code></pre>    <p> </p>    <p>来自:https://segmentfault.com/a/1190000006789644</p>    <p> </p>