Java 的 PDF 处理类,Apache PDFBox 1.8.11 发布
jopen 9年前
Apache PDFBox 1.8.11 发布,此版本是个增量 bug 修复版本,包括大量 bug 修复和改进。
现已提供下载:
http://pdfbox.apache.org/download.cgi
主要改进内容:
Bug 修复 [PDFBOX-962] - All sort of Problems when importing Xfdf files into PDFs -> damaged pdfs and NPEs [PDFBOX-2508] - Text extraction getting zero font height, bad widths, and ? for text in this PDF with Type 3 Fonts [PDFBOX-2693] - OutOfMemoryError at org.apache.fontbox.cff.IndexData.initData(IndexData.java:95) [PDFBOX-2816] - PDFBox makes disallowed changes when signing a signed document [PDFBOX-2845] - Error parsing PDF [PDFBOX-2901] - High CPU load and OutOfMemoryError when rendering shading [PDFBOX-2903] - ClassCastException at PDFParser:667 [PDFBOX-2909] - NullPointerException when rendering shading with no function [PDFBOX-2911] - Merge does not close input streams [PDFBOX-2914] - java.lang.NegativeArraySizeException in org.apache.pdfbox.pdmodel.graphics.color.PDDeviceGray.createColorModel [PDFBOX-2916] - ArrayIndexOutOfBoundsException in CmapSubtable.processSubtype6 [PDFBOX-2923] - CFFParser parser treats CIDFont's charset data as SID [PDFBOX-2924] - ClassCastException when doing PDFSplit [PDFBOX-2925] - EmptyStackException in PDFStreamEngine.getColorSpaces [PDFBOX-2935] - Problem while extracting font from PDFontSetting (used in PDExtendedGraphicsState) [PDFBOX-2940] - ClassCastException in FDF export [PDFBOX-2958] - TIFF-Predictor with 1 bit per component not supported [PDFBOX-2964] - Checkbox getOnValue() throws NPE [PDFBOX-2965] - NPE in PDAcroForm.getField() if the /Fields entry is missing [PDFBOX-2976] - java.util.zip.DataFormatException: incorrect data check [PDFBOX-2982] - fix ClassCastExceptions in operator methods [PDFBOX-2985] - Potential NPE in PDMarkedContent#getMCID() [PDFBOX-2986] - Potential resource leak in TTFParser's use of RAFDataStream [PDFBOX-2987] - NPE in PDTrueTypeFont.extractCMaps [PDFBOX-2988] - Infinite recursion in ExtractImages 1.8.11-SNAPSHOT [PDFBOX-2989] - LZW decode filter shouldn't throw IndexOutOfBoundsException [PDFBOX-2990] - PDDocument.load fails to load a PDF document. [PDFBOX-2996] - StackOverflow in Quicksort [PDFBOX-3002] - PDF files not closed after load fails [PDFBOX-3022] - Maven repos should be https [PDFBOX-3034] - Newly created XRef stream has direct root objects [PDFBOX-3035] - Files with missing xref table must fail [PDFBOX-3041] - Wrong default type in Xref stream W0 element [PDFBOX-3087] - Metadata stream should not be compressed [PDFBOX-3097] - ClassCastException in Axial / Radial shading when object reference in extends [PDFBOX-3110] - Extract by beads doesn't work [PDFBOX-3114] - Visible signatures in different pages changes previous revision [PDFBOX-3153] - Direct JPEG extraction results in invalid images in 2.0.0 releases. [PDFBOX-3155] - org.apache.pdfbox.util.PDFTextStripper class initialization throws NumberFormatException with recent Verona-enabled Java 9 JVMs [PDFBOX-3157] - PDOutputIntent has N=3 (RGB) hardcoded [PDFBOX-3173] - Signature dictionary is not decrypted in encrypted files [PDFBOX-3190] - Links don't work in firefox [PDFBOX-3193] - New NPE in PDFBox 1.8.11-rc1 in Acroform PDCheckbox's isChecked() 改进 [PDFBOX-1621] - Add setModifiedDate(Calendar c) to PDAnnotation [PDFBOX-2891] - Use animal sniffer maven plugin to detect non java 5 api usage within the 1.8 branch [PDFBOX-2952] - Log statement on level 'severe' while nothing else indicates error [PDFBOX-2962] - Handle TIFF predictor for bpc 2 and 4 / optimize existing predictor code [PDFBOX-3007] - Preflight cookbook example is inefficient [PDFBOX-3176] - Add a removeRegion method in PDFTextSTripperByArea class
PDFBox是Java实现的PDF文档协作类库,提供PDF文档的创建、处理以及文档内容提取功能,也包含了一些命令行实用工具。
主要特性包括:
- 从PDF提取文本
- 合并PDF文档
- PDF 文档加密与解密
- 与Lucene搜索引擎的集成
- 填充PDF/XFDF表单数据
- 从文本文件创建PDF文档
- 从PDF页面创 建图片
- 打印PDF文档