简体   繁体   English

使用 PHP 将 Word doc、docx 和 Excel xls、xlsx 转换为 PDF

[英]Convert Word doc, docx and Excel xls, xlsx to PDF with PHP

I am looking for a way to convert Word and Excel files to PDF using PHP.我正在寻找一种使用 PHP 将 Word 和 Excel 文件转换为 PDF 的方法。

The reason for this, is I need to be able to combine files of various formats into one document.这样做的原因是我需要能够将各种格式的文件合并到一个文档中。 I know that if I am able to convert everything to PDF I can then merge the PDFs into one file using PDFMerger (which uses fpdf).我知道,如果我能够将所有内容都转换为 PDF,我就可以使用 PDFMerger(使用 fpdf)将 PDF 合并为一个文件。

I am already able to create PDFs from other file types / images, but am stuck with Word Docs.我已经能够从其他文件类型/图像创建 PDF,但我坚持使用 Word Docs。 (I think I would possibly be able to convert the Excel files using the PHPExcel library that I already use to create Excel files from html code). (我想我可能能够使用 PHPExcel 库转换 Excel 文件,我已经使用该库从 html 代码创建 Excel 文件)。

I do not use the Zend Framework, so am hoping that someone will be able to point me in the right direction.我不使用 Zend 框架,所以我希望有人能够指出我正确的方向。

Alternatively, if there is a way to create image (jpg) files from the Word documents, that would be workable.或者,如果有一种方法可以从 Word 文档创建图像 (jpg) 文件,那将是可行的。

Thanks for any help!谢谢你的帮助!

I found a solution to my issue and after a request, will post it here to help others.我找到了我的问题的解决方案,并在收到请求后将其发布在这里以帮助其他人。 Apologies if I missed any details, it's been a while since I worked on this solution.抱歉,如果我错过了任何细节,我已经有一段时间没有研究这个解决方案了。

The first thing that is required is to install Openoffice.org on the server.需要做的第一件事是在服务器上安装Openoffice.org I requested my hosting provider to install the open office RPM on my VPS.我要求我的托管服务提供商在我的 VPS 上安装开放式办公室 RPM。 This can be done through WHM directly.这可以直接通过 WHM 完成。

Now that the server has the capability to handle MS Office files you are able to convert the files by executing command line instructions via PHP.既然服务器具有处理 MS Office 文件的能力,您就可以通过 PHP 执行命令行指令来转换文件。 To handle this, I found PyODConverter : https://github.com/mirkonasato/pyodconverter为了解决这个问题,我找到了PyODConverterhttps : //github.com/mirkonasato/pyodconverter

I created a directory on the server and placed the PyODConverter python file within it.我在服务器上创建了一个目录并将 PyODConverter python 文件放在其中。 I also created a plain text file above the web root (I named it "adocpdf"), with the following command line instructions in it:我还在网络根目录上方创建了一个纯文本文件(我将其命名为“adocpdf”),其中包含以下命令行说明:

directory=$1
filename=$2
extension=$3
SERVICE='soffice'
if [ "`ps ax|grep -v grep|grep -c $SERVICE`" -lt 1 ]; then 
unset DISPLAY
/usr/bin/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard & 
sleep 5s
fi
python /home/website/python/DocumentConverter.py /home/website/$directory$filename$extension /home/website/$directory$filename.pdf

This checks that the openoffice.org libraries are running and then calls the PyODConverter script to process the file and output it as a PDF.这会检查 openoffice.org 库是否正在运行,然后调用 PyODConverter 脚本来处理文件并将其输出为 PDF。 The 3 variables on the first three lines are provided when the script is executed from with a PHP file.前三行的 3 个变量是在使用 PHP 文件执行脚本时提供的。 The delay ("sleep 5s") is used to ensure that openoffice.org has enough to time to initiate if required.延迟(“睡眠 5 秒”)用于确保 openoffice.org 有足够的时间在需要时启动。 I have used this for months now and the 5s gap seems to give enough breathing room.我已经用了几个月了,5s 的差距似乎给了足够的喘息空间。

The script will create a PDF version of the document in the same directory as the original.该脚本将在与原始文档相同的目录中创建文档的 PDF 版本。

Finally, initiating the conversion of a Word / Excel file from within PHP (I have it within a function that checks if the file we are dealing with is a word / excel document)...最后,从 PHP 中启动 Word/Excel 文件的转换(我在一个函数中使用它来检查我们正在处理的文件是否为 word/excel 文档)...

//use openoffice.org
$output = array();
$return_var = 0;
exec("/opt/adocpdf {$directory} {$filename} {$extension}", $output, $return_var);

This PHP function is called once the Word / Excel file has been uploaded to the server.一旦 Word/Excel 文件上传到服务器,就会调用这个 PHP 函数。 The 3 variables in the exec() call relate directly to the 3 at the start of the plain text script above. exec() 调用中的 3 个变量与上面纯文本脚本开头的 3 个直接相关。 Note that the $directory variable requires no leading forward slash if the file for conversion is within the web root.请注意,如果要转换的文件位于 Web 根目录中,则 $directory 变量不需要前导正斜杠。

OK, that's it!好的,就是这样! Hopefully this will be useful to someone and save them the difficulties and learning curve I faced.希望这对某人有用,并为他们节省我面临的困难和学习曲线。

Well my 2 cents when it comes to the topic word 2007 docx , word 97-2004 doc , pdf and all other types of MS Office wishing to be "converted from y to z but in real they don't wanna be".好吧,当谈到主题 word 2007 docx 、word 97-2004 docpdf和所有其他类型的 MS Office 希望“从y转换为z但实际上他们不想成为”时,我的 2 美分。 In my experience so far, conversion with LibreOffice or OpenOffice can't be relied on.根据我目前的经验,不能依赖 LibreOffice 或 OpenOffice 的转换。 Though .doc documents tend to be better supported than word 2007's .docx .尽管.doc文档往往比 word 2007 的.docx得到更好的支持。 In general it's very hard to convert the .docx to .doc without breaking anything.一般来说,很难在不破坏任何内容的情况下将.docx转换为.doc

.docx also tend to be extremely useful for templating where .doc is not for being binary. .docx也往往对于.doc不是二进制的模板非常有用。

The conversion from .doc to PDF was most of the time quite reliable..doc到 PDF 的转换在大多数情况下是非常可靠的。 If you can still influence the design or content of the word document then this might be satisfying, but in my situation documents were supplied from foreign companies where even after generating the .docx templates, in some scenario's, the generated .docx had to be slightly modified with supplement text before it was generated to a PDF.如果您仍然可以影响 word 文档的设计或内容,那么这可能会令人满意,但在我的情况下,文档是由外国公司提供的,即使在生成.docx模板之后,在某些情况下,生成的.docx也必须稍微在生成 PDF 之前使用补充文本进行修改。


WINDOWS BASED!基于WINDOWS!

All this hiccup made me come to the conclusion that the only true reliable conversion method I found was using the COM class in PHP and let the MS Word or Excel Application do all the work for you.所有这些小问题让我得出结论,我找到的唯一真正可靠的转换方法是使用 PHP 中的COM类,让 MS Word 或 Excel 应用程序为您完成所有工作。 I'll just give an example on converting .docx to .doc and/or PDF.我将仅举一个将.docx转换为.doc和/或 PDF 的示例。 If you do not have MS Office installed, you can download a trial version of 60 days which would give you enough room for testing purposes.如果您没有安装 MS Office,您可以下载 60 天的试用版,这将为您提供足够的测试空间。

the COM.net extension is by default commented out in the php.ini , just search for the line php_com_dotnet.dll and uncomment it like so COM.net 扩展默认在php.ini被注释掉,只需搜索php_com_dotnet.dll行并像这样取消注释

  extension=php_com_dotnet.dll

Restart the web server (IIS is not a pre, Apache will work just as well).重新启动 Web 服务器(IIS 不是预安装,Apache 也能正常工作)。

The code below is a demonstration on how easy it is.下面的代码演示了它是多么容易。

  $word = new COM("Word.Application") or die ("Could not initialise Object.");
  // set it to 1 to see the MS Word window (the actual opening of the document)
  $word->Visible = 0;
  // recommend to set to 0, disables alerts like "Do you want MS Word to be the default .. etc"
  $word->DisplayAlerts = 0;
  // open the word 2007-2013 document 
  $word->Documents->Open('yourdocument.docx');
  // save it as word 2003
  $word->ActiveDocument->SaveAs('newdocument.doc');
  // convert word 2007-2013 to PDF
  $word->ActiveDocument->ExportAsFixedFormat('yourdocument.pdf', 17, false, 0, 0, 0, 0, 7, true, true, 2, true, true, false);
  // quit the Word process
  $word->Quit(false);
  // clean up
  unset($word);

This is just a small demonstration.这只是一个小示范。 I can just say that if it comes to conversion, this was the only real reliable option I could use and even recommend.我只能说,如果涉及转换,这是我可以使用甚至推荐的唯一真正可靠的选择。

1) I am using WAMP. 1) 我正在使用 WAMP。

2) I have installed Open Office (from apache http://www.openoffice.org/download/ ). 2)我已经安装了 Open Office(来自 apache http://www.openoffice.org/download/ )。

3) $output_dir = "C:/wamp/www/projectfolder/"; 3) $output_dir = "C:/wamp/www/projectfolder/"; this is my project folder where i want to create output file.这是我要在其中创建输出文件的项目文件夹。

4) I have already placed my input file here C:/wamp/www/projectfolder/wordfile.docx"; 4) 我已经把我的输入文件放在这里C:/wamp/www/projectfolder/wordfile.docx";

Then I Run My Code.. (given below)然后我运行我的代码..(如下所示)

<?php
    set_time_limit(0);
    function MakePropertyValue($name,$value,$osm){
    $oStruct = $osm->Bridge_GetStruct("com.sun.star.beans.PropertyValue");
    $oStruct->Name = $name;
    $oStruct->Value = $value;
    return $oStruct;
    }
    function word2pdf($doc_url, $output_url){

    //Invoke the OpenOffice.org service manager
    $osm = new COM("com.sun.star.ServiceManager") or die ("Please be sure that OpenOffice.org is installed.\n");
    //Set the application to remain hidden to avoid flashing the document onscreen
    $args = array(MakePropertyValue("Hidden",true,$osm));
    //Launch the desktop
    $oDesktop = $osm->createInstance("com.sun.star.frame.Desktop");
    //Load the .doc file, and pass in the "Hidden" property from above
    $oWriterDoc = $oDesktop->loadComponentFromURL($doc_url,"_blank", 0, $args);
    //Set up the arguments for the PDF output
    $export_args = array(MakePropertyValue("FilterName","writer_pdf_Export",$osm));
    //print_r($export_args);
    //Write out the PDF
    $oWriterDoc->storeToURL($output_url,$export_args);
    $oWriterDoc->close(true);
    }

    $output_dir = "C:/wamp/www/projectfolder/";
    $doc_file = "C:/wamp/www/projectfolder/wordfile.docx";
    $pdf_file = "outputfile_name.pdf";

    $output_file = $output_dir . $pdf_file;
    $doc_file = "file:///" . $doc_file;
    $output_file = "file:///" . $output_file;
    word2pdf($doc_file,$output_file);
    ?>

I successfully put a portable version of libreoffice on my host's webserver, which I call with PHP to do a commandline conversion from .docx, etc. to pdf.我成功地将 libreoffice 的便携式版本放在我主机的网络服务器上,我用 PHP 调用它来执行从 .docx 等到 pdf 的命令行转换。 on the fly.在飞行中。 I do not have admin rights on my host's webserver.我在主机的网络服务器上没有管理员权限。 Here is my blog post of what I did:这是我的博客文章,介绍了我所做的事情:

http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx

Yay!好极了! Convert directly from .docx or .odt to .pdf using PHP with LibreOffice (OpenOffice's successor)!使用 PHP 和 LibreOffice(OpenOffice 的继任者)直接从 .docx 或 .odt 转换为 .pdf!

Open Office / LibreOffice based solutions will do an OK job, but don't expect your PDFs to resemble your source files if they were created in MS-Office.基于 Open Office / LibreOffice 的解决方案会做的不错,但如果 PDF 是在 MS-Office 中创建的,则不要指望它们与源文件相似。 A PDF that looks 90% like the original is not considered to be acceptable in many fields.看起来与原始文件 90% 相似的 PDF 在许多领域被认为是不可接受的。

The only way to make sure your PDFs look exactly like the originals is to use a solution that uses the official MS-Office DLLs under the hood.确保您的 PDF 看起来与原件完全一样的唯一方法是使用一种在幕后使用官方 MS-Office DLL 的解决方案。 If you are running your PHP solution on non-Windows based servers then it requires an additional Windows Server.如果您在非基于 Windows 的服务器上运行 PHP 解决方案,则需要额外的 Windows Server。 This may be a showstopper, but if you really care about the look and feel of your PDFs you may not have an option.这可能是一个亮点,但如果您真的关心 PDF 的外观和感觉,您可能没有选择。

Have a look at this blog post .看看这篇博文 It shows how to use PHP to convert MS-Office files with a high level of fidelity.它展示了如何使用 PHP 以高保真度转换 MS-Office 文件。

Disclaimer: I wrote this blog post and worked on a related commercial product, so consider me biased.免责声明:我写了这篇博文并致力于相关的商业产品,所以认为我有偏见。 However, it appears to be a great solution for the PHP people I work with.但是,对于与我一起工作的 PHP 人员来说,它似乎是一个很好的解决方案。

Step 1. Install "Apache_OpenOffice_4.1.2" in your system Step 2. Download "unoconv" library from github or any where else.步骤 1. 在您的系统中安装“Apache_OpenOffice_4.1.2” 步骤 2. 从 github 或其他任何地方下载“unoconv”库。

-> C:\\Program Files (x86)\\OpenOffice 4\\program\\python.exe = Path of open office install directory -> C:\\Program Files (x86)\\OpenOffice 4\\program\\python.exe = 打开办公安装目录的路径

-> D:\\wamp\\www\\doc_to_pdf\\libobasis4.4-pyuno\\unoconv = Path of library folder -> D:\\wamp\\www\\doc_to_pdf\\libobasis4.4-pyuno\\unoconv = 库文件夹路径

-> D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' -> D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' = path and file name of pdf = pdf的路径和文件名

-> D:/wamp/www/doc_to_pdf/files/'.$doc_file_name = Path of your document file. -> D:/wamp/www/doc_to_pdf/files/'.$doc_file_name = 文档文件的路径。

If pdf not created than last step is Go to ->Control Panel\\All Control Panel Items\\Administrative Tools-> services-> find "wampapache" -> right click and click on property -> click on logon tab Than check checkbox of allow service to interact with desktop如果没有创建 pdf 比最后一步是转到->控制面板\\所有控制面板项目\\管理工具->服务->找到“wampapache”->右键单击并单击属性->单击登录选项卡比允许复选框与桌面交互的服务

Create sample .php file and put below code and run on wamp or xampp server创建示例 .php 文件并放入以下代码并在 wamp 或 xampp 服务器上运行

$result = exec('"C:\Program Files (x86)\OpenOffice 4\program\python.exe" D:\wamp\www\doc_to_pdf\libobasis4.4-pyuno\unoconv -f pdf -o D:/wamp/www/doc_to_pdf/files/'.$pdf_File_name.' D:/wamp/www/doc_to_pdf/files/'.$doc_file_name);

This code working for me in windows-8 operating system这段代码在 windows-8 操作系统中对我有用

I have found some solution after so much googling.经过如此多的谷歌搜索,我找到了一些解决方案。 You can also try it if tired to search for a good solution.如果厌倦了寻找好的解决方案,您也可以尝试一下。

For common using SOAP API常见的使用 SOAP API

You need username and password to make SOAP request on https://www.livedocx.com您需要用户名和密码才能在https://www.livedocx.com上发出 SOAP 请求

Make registration using this https://www.livedocx.com/user/account_registration.aspx and follow the steps accordingly.使用此https://www.livedocx.com/user/account_registration.aspx 进行注册并按照相应的步骤操作。

Use below code in your .php file.在您的 .php 文件中使用以下代码。

ini_set ('soap.wsdl_cache_enabled', 0);

// you will get this username and pass while register
define ('USERNAME', 'Username'); 
define ('PASSWORD', 'Password');

// SOAP WSDL endpoint
define ('ENDPOINT', 'https://api.livedocx.com/2.1/mailmerge.asmx?wsdl');
 
// Define timezone
date_default_timezone_set('Europe/Berlin');
$soap = new SoapClient(ENDPOINT);
$soap->LogIn(
    array(
        'username' => USERNAME,
        'password' => PASSWORD
    )
);
$data = file_get_contents('test.doc');
$soap->SetLocalTemplate(
    array(
        'template' => base64_encode($data),
        'format'   => 'doc'
    )
);
$soap->CreateDocument();
$result = $soap->RetrieveDocument(
    array(
        'format' => 'pdf'
    )
);
$data = $result->RetrieveDocumentResult;
file_put_contents('tree.pdf', base64_decode($data));
$soap->LogOut();
unset($soap);

Follow this link for more information http://www.phplivedocx.org/点击此链接了解更多信息http://www.phplivedocx.org/

For Ubuntu对于 Ubuntu

OpenOffice and Unoconv installation Required.需要安装 OpenOffice 和 Unoconv。

from command prompt从命令提示符

apt-get remove --purge unoconv
git clone https://github.com/dagwieers/unoconv
cd unoconv
sudo make install

Now add below code in your PHP script and make sure file should be executable.现在在你的 PHP 脚本中添加以下代码并确保文件应该是可执行的。

shell_exec('/usr/bin/unoconv -f pdf  folder/test.docx');
shell_exec('/usr/bin/unoconv -f pdf  folder/sachin.png');

Hope this solution help you.希望此解决方案对您有所帮助。

For a PHP-specific you could try PHPWord - this library is written in pure PHP and provides a set of classes to write to and read from different document file formats (including .doc and .docx).对于特定于 PHP 的,您可以尝试PHPWord - 这个库是用纯 PHP 编写的,并提供一组类来写入和读取不同的文档文件格式(包括 .doc 和 .docx)。 The main drawback is that the quality of converted files can be quite variable.主要缺点是转换后的文件的质量可能会有很大差异。

Alternatively if you want a higher quality option you could use a file conversion API like Zamzar .或者,如果您想要更高质量的选项,您可以使用像Zamzar这样的文件转换 API。 You can use it to convert a wide range of office formats (and others) into PDF, and you can call from any platform (Windows, Linux, OS X etc).您可以使用它将各种办公格式(和其他格式)转换为 PDF,并且您可以从任何平台(Windows、Linux、OS X 等)调用。

PHP code to convert a file would look like this:用于转换文件的 PHP 代码如下所示:

<?php
$endpoint = "https://api.zamzar.com/v1/jobs";
$apiKey = "API_KEY";
$sourceFilePath = "/my.doc"; // Or docx/xls/xlsx etc
$targetFormat = "pdf";

$postData = array(
  "source_file" => $sourceFile,
  "target_format" => $targetFormat
);

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $endpoint);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'POST');
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_SAFE_UPLOAD, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERPWD, $apiKey . ":");
$body = curl_exec($ch);
curl_close($ch);

$response = json_decode($body, true);
print_r($response);
?>

Full disclosure: I'm the lead developer for the Zamzar API.完全披露:我是 Zamzar API 的首席开发人员。

Another way to do this, is using directly a parameter on the libreoffice command:另一种方法是直接使用 libreoffice 命令上的参数:

libreoffice --convert-to pdf /path/to/file.{doc,docx}

---- ---- ---- ---- ---- ---- Explanation ---- ---- ---- ---- ---- ---- - - - - - - - - - - - - 解释 - - - - - - - - - - - -

First you need to download and install LibreOffice.首先,您需要下载并安装 LibreOffice。 Can be downloaded from Here可以从这里下载
Now open your terminal / command prompt then go to libreOffice root, for windows it may be OS/Program Files/LibreOffice/program here you'll find an executable soffice.exe现在打开你的终端/命令提示符,然后转到 libreOffice 根目录,对于 Windows,它可能是OS/Program Files/LibreOffice/program在这里你会找到一个可执行的 soffice.exe

Here you can convert it directly by the above mentioned commands or you may also use :在这里你可以通过上面提到的命令直接转换它,或者你也可以使用:
soffice in place of libreoffice soffice代替libreoffice

Have you tried http://www.phpdocx.com/ ?你试过http://www.phpdocx.com/吗? Plus, it can be hosted on your server too.另外,它也可以托管在您的服务器上。

The easiest way to do this in my experience is with the Cloudmersive free native PHP library, just call convertDocumentDocxToPdf:根据我的经验,最简单的方法是使用 Cloudmersive 免费原生 PHP 库,只需调用 convertDocumentDocxToPdf:

<?php
require_once(__DIR__ . '/vendor/autoload.php');

// Configure API key authorization: Apikey
$config = Swagger\Client\Configuration::getDefaultConfiguration()->setApiKey('Apikey', 'YOUR_API_KEY');



$apiInstance = new Swagger\Client\Api\ConvertDocumentApi(


    new GuzzleHttp\Client(),
    $config
);
$input_file = "/path/to/file.txt"; // \SplFileObject | Input file to perform the operation on.

try {
    $result = $apiInstance->convertDocumentDocxToPdf($input_file);
    print_r($result);
} catch (Exception $e) {
    echo 'Exception when calling ConvertDocumentApi->convertDocumentDocxToPdf: ', $e->getMessage(), PHP_EOL;
}
?>

Be sure to replace $input_file with the appropriate file path.请务必将 $input_file 替换为适当的文件路径。 You can also configure it to use a byte array if you prefer to do it that way.如果您喜欢这样做,您也可以将其配置为使用字节数组。 The result will be the bytes of the converted PDF file.结果将是转换后的 PDF 文件的字节数。

Anyone who is looking to do this in Ubuntu/linux using php -任何希望使用 php 在 Ubuntu/linux 中执行此操作的人 -

Ubuntu comes with libre office installed default. Ubuntu 默认安装了 libre office。 Anyone can use the shell command to use the headless libre office for this.任何人都可以使用 shell 命令为此使用 headless libre office。

shell_exec('/usr/bin/libreoffice --headless --convert-to pdf:writer_pdf_Export --outdir /var/www/html/demo/public_html/src/var/output /var/www/html/demo/public_html/src/var/source/sample.doc');

Hope it helps others like me.希望它能帮助像我这样的人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM