用于转换PDF，TXT，HTML和图片的实用PHP代码片段

jopen 11年前

以下是一些用于转换PDF，TXT，HTML和图片的实用PHP代码片段。

将PDF转成JPG格式的图片 - PDF2JPG

需要先安装 Image Magic 扩展。简单和非常实用的代码片段用于将PDF文件转成JPG格式的图片。

 /**       * PDF2JPG script       * with ImageMagick       */            $pdf_file = './pdf/_folder/example.pdf';      $save_to = './image_folder/example.jpg'; //make sure that apache has permissions to write in this folder! (common problem)            //execute ImageMagick command 'convert' and convert PDF to JPG with applied settings      exec('convert "'.$pdf_file.'" -colorspace RGB -resize 800 "'.$save_to.'"', $output, $return_var);                  if($return_var == 0) { //if exec successfuly converted pdf to jpg      print "Conversion OK";      }      else print "Conversion failed.<br />".$output;

html2ps - 将 HTML 转成 pdf

这绝对是非常有用的片段，将HTML转换为PDF文件。我曾在几个项目中使用它。

    /*******************************************************************************       *       */      function convert_to_pdf($url, $path_to_pdf) {      require_once(dirname(__FILE__).'/html2ps/config.inc.php');      require_once(HTML2PS_DIR.'pipeline.factory.class.php');      echo WRITER_TEMPDIR;      //error_reporting(E_ALL);      //ini_set("display_errors","1");      @set_time_limit(10000);      parse_config_file(HTML2PS_DIR.'html2ps.config');            /**        * Handles the saving  of  generated PDF to user-defined output file on server        */      class MyDestinationFile extends Destination {      /**        * @var String result file name / path        * @access private        */      var $_dest_filename;            function MyDestinationFile($dest_filename) {      $this->_dest_filename = $dest_filename;      }            function process($tmp_filename, $content_type) {      copy($tmp_filename, $this->_dest_filename);      }      }                  $media = Media::predefined("A4");      $media->set_landscape(false);      $media->set_margins(array('left' => 5,      'right' => 5,      'top' => 10,      'bottom' => 10));      $media->set_pixels(800);            $pipeline = PipelineFactory::create_default_pipeline("", // Auto-detect encoding      "");      // Override HTML source      $pipeline->fetchers[] = new FetcherURL;      $pipeline->data_filters[] = new DataFilterHTML2XHTML;      $pipeline->parser = new ParserXHTML;      $pipeline->layout_engine = new LayoutEngineDefault;            $pipeline->output_driver = new OutputDriverFPDF($media);      //$filter = new PreTreeFilterHeaderFooter("HEADER", "FOOTER");      //$pipeline->pre_tree_filters[] = $filter;            // Override destination to local file      $pipeline->destination = new MyDestinationFile($path_to_pdf);            global $g_config;      $g_config = array(      'cssmedia' => 'screen',      'scalepoints' => '1',      'renderimages' => true,      'renderlinks' => true,      'renderfields' => true,      'renderforms' => false,      'mode' => 'html',      'encoding' => '',      'debugbox' => false,      'pdfversion' => '1.4',      'draw_page_border' => false      );      $pipeline->configure($g_config);      //$pipeline->add_feature('toc', array('location' => 'before'));      $pipeline->process($url, $media);      }

将HTML转换为文本

如果您正在构建模拟器的搜索引擎，基于文本的浏览器，我觉得这可能来得心应手了。

        <?php      // strip javascript, styles, html tags, normalize entities and spaces      // based on http://www.php.net/manual/en/function.strip-tags.php#68757      function html2text($html){      $text = $html;      static $search = array(      '@<script.+?</script>@usi', // Strip out javascript content      '@<style.+?</style>@usi', // Strip style content      '@<!--.+?-->@us', // Strip multi-line comments including CDATA      '@</?[a-z].*?\>@usi', // Strip out HTML tags      );      $text = preg_replace($search, ' ', $text);      // normalize common entities      $text = normalizeEntities($text);      // decode other entities      $text = html_entity_decode($text, ENT_QUOTES, 'utf-8');      // normalize possibly repeated newlines, tabs, spaces to spaces      $text = preg_replace('/\s+/u', ' ', $text);      $text = trim($text);      // we must still run htmlentities on anything that comes out!      // for instance:      // <<a>script>alert('XSS')//<<a>/script>      // will become      // <script>alert('XSS')//</script>      return $text;      }            // replace encoded and double encoded entities to equivalent unicode character      // also see /app/bookmarkletPopup.js      function normalizeEntities($text) {      static $find = array();      static $repl = array();      if (!count($find)) {      // build $find and $replace from map one time      $map = array(      array('\'', 'apos', 39, 'x27'), // Apostrophe      array('\'', ''', 'lsquo', 8216, 'x2018'), // Open single quote      array('\'', ''', 'rsquo', 8217, 'x2019'), // Close single quote      array('"', '"', 'ldquo', 8220, 'x201C'), // Open double quotes      array('"', '"', 'rdquo', 8221, 'x201D'), // Close double quotes      array('\'', ',', 'sbquo', 8218, 'x201A'), // Single low-9 quote      array('"', ',,', 'bdquo', 8222, 'x201E'), // Double low-9 quote      array('\'', '´', 'prime', 8242, 'x2032'), // Prime/minutes/feet      array('"', '´´', 'Prime', 8243, 'x2033'), // Double prime/seconds/inches      array(' ', 'nbsp', 160, 'xA0'), // Non-breaking space      array('-', '-', 8208, 'x2010'), // Hyphen      array('-', '-', 'ndash', 8211, 150, 'x2013'), // En dash      array('--', '--', 'mdash', 8212, 151, 'x2014'), // Em dash      array(' ', ' ', 'ensp', 8194, 'x2002'), // En space      array(' ', ' ', 'emsp', 8195, 'x2003'), // Em space      array(' ', ' ', 'thinsp', 8201, 'x2009'), // Thin space      array('*', 'o', 'bull', 8226, 'x2022'), // Bullet      array('*', '?', 8227, 'x2023'), // Triangular bullet      array('...', '...', 'hellip', 8230, 'x2026'), // Horizontal ellipsis      array('°', 'deg', 176, 'xB0'), // Degree      array('EUR', 'euro', 8364, 'x20AC'), // Euro      array('¥', 'yen', 165, 'xA5'), // Yen      array('£', 'pound', 163, 'xA3'), // British Pound      array('©', 'copy', 169, 'xA9'), // Copyright Sign      array('®', 'reg', 174, 'xAE'), // Registered Sign      array('(TM)', 'trade', 8482, 'x2122') // TM Sign      );      foreach ($map as $e) {      for ($i = 1; $i < count($e); ++$i) {      $code = $e[$i];      if (is_int($code)) {      // numeric entity      $regex = "/&(amp;)?#0*$code;/";      }      elseif (preg_match('/^.$/u', $code)/* one unicode char*/) {      // single character      $regex = "/$code/u";      }      elseif (preg_match('/^x([0-9A-F]{2}){1,2}$/i', $code)) {      // hex entity      $regex = "/&(amp;)?#x0*" . substr($code, 1) . ";/i";      }      else {      // named entity      $regex = "/&(amp;)?$code;/";      }      $find[] = $regex;      $repl[] = $e[0];      }      }      } // end first time build      return preg_replace($find, $repl, $text);      }

Convert a PDF to text with Perl

This is the only exception in our list all other snippets are based on PHP, but this short script is written in Perl and simply converts the PDF 'demo.pdf' to plain text.

        #!/perl/bin/perl -w      use CAM::PDF;      use CAM::PDF::PageText;            $filename = "demo.pdf";            my $pdf = CAM::PDF->new($filename);      my $pageone_tree = $pdf->getPageContentTree(4);      print CAM::PDF::PageText->render($pageone_tree);            #Note: I had to install CAM::PDF::PageText by hand, it was not installed by CPAN when I installed CAM::PDF.

I hope you will find the above snippets useful. I wanted to include DOC to PDF converter too, though PHP isn't much of help in this case, however, I found few links that might be useful that can convert different file formats.

Converting documents (ODT, DOC to PDF) on PHP with Unoconv

I found this accidentally, though you need Unoconv installed. Python tool that utilizes LibreOffice libs (pyuno).

Here is the link to guide installation and usage:

http://tech.rgou.net/en/php/converting-documents-odt-doc-to-pdf-on-php-with-unoconv-libreoffice/

Using LiveDocx with PHP 4 and PHP 5 NuSOAP

And another use of LiveDocx without need of Zend Framework, but but does require the SOAP library NuSOAP.

Here is link to guide: http://www.phplivedocx.org/articles/using-livedocx-with-nusoap/