简体   繁体   English

使用 Python 和 LibreOffice 将 pdf 转换为 docx 和 doc 转换为 docx 时遇到问题

[英]Having trouble using Python and LibreOffice to convert pdf to docx and doc to docx

I have spent a good amount of time trying to determine what is going wrong exactly, with the code I am using to convert pdf to docx (and doc to docx) using LibreOffice.我花了很多时间试图确定到底出了什么问题,我使用 LibreOffice 将 pdf 转换为 docx(以及将 doc 转换为 docx)的代码。

I have used both the windows run interface to test-run some of the code I have found to be relevant, and have tried on python as well, neither of which works.我已经使用 Windows 运行界面来测试运行一些我发现相关的代码,并且也尝试过 python,但都没有奏效。

I have LibreOffice v6.0.2 installed on windows.我在 Windows 上安装了 LibreOffice v6.0.2。

I have been using variations of this code to attempt to convert some pdf files to docx of which the specific pdf file is not really relevant:我一直在使用此代码的变体来尝试将某些 pdf 文件转换为 docx,其中特定的 pdf 文件并不真正相关:

    import subprocess
    lowriter='C://Program Files/LibreOffice/program/swriter.exe'
    subprocess.run('{} --invisible --convert-to docx --outdir "{}" "{}"'
                   .format(lowriter,'dir',
                                
    'filepath.pdf',),shell=True)

I have tried code, again, in both the run interface on the windows os, and through python using the above code, with no luck.我再次在 Windows 操作系统的运行界面中尝试了代码,并使用上述代码通过 python 尝试了代码,但没有成功。 I have tried without the outdir as well, just in case I was writing that incorrectly, but always get a return code of 1:我也尝试过不使用 outdir,以防万一我写错了,但总是得到 1 的返回码:

    CompletedProcess(args='C://Program Files/LibreOffice/program/swriter.exe 
    --invisible --convert-to docx --outdir "{dir}" 
    {filepath.pdf}"', returncode=1)

The dir and filepath.pdf are place holders I have put. dir 和 filepath.pdf 是我放置的占位符。

I have a similar problem with the doc to docx conversion.我对 doc 到 docx 的转换有类似的问题。

There are a number of problems here.这里有很多问题。 You should first get the --convert-to call to work from the command line as @CristiFati commented, and then implement in python.您应该首先按照@CristiFati 的评论从命令行获取--convert-to调用,然后在 python 中实现。

Here is the code that works on my system.这是适用于我的系统的代码。 No // in the path, and quotes are needed.路径中没有// ,并且需要引号。 Also, the folder is LibreOffice 5 on my system.此外,该文件夹是我系统上的LibreOffice 5

import subprocess
lowriter = 'C:/Program Files (x86)/LibreOffice 5/program/swriter.exe'
subprocess.run(
    '"{}" --convert-to docx --outdir "{}" "{}"'
    .format(lowriter,'dir', 'filepath.doc',), shell=True)

Finally, it looks like converting from PDF to DOCX is not supported.最后,似乎不支持从 PDF 转换为 DOCX。 LibreOffice Draw can open a PDF file and save as ODG format. LibreOffice Draw 可以打开 PDF 文件并另存为 ODG 格式。

EDIT :编辑

Here is working code to convert from PDF.这是从 PDF 转换的工作代码。 I upgraded to LO 6, so the version number ("LibreOffice 5") is no longer required in the path.我升级到 LO 6,因此路径中不再需要版本号(“LibreOffice 5”)。

import subprocess
loffice = 'C:/Program Files/LibreOffice/program/soffice.exe'
subprocess.run(
    '"{}" --convert-to odg --outdir "{}" "{}"'
    .format(loffice,'dir', 'filepath.pdf',), shell=True)

文件路径.odg

Install pdf2docx package in python在python中安装pdf2docx包

source      = r'C:\Users\sdDesktop\New Project/Document2.pdf'
destination = r'C:\Users\sd\Desktop\New Project/sample_6.docx'

def Converter_pdf2docx(source,destination):
    pdf_file  = source
    docx_file = destination
    cv = Converter(pdf_file)
    cv.convert(docx_file, start=0, end=None)
    cv.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM