简体   繁体   English

使用python docx组合word文档

[英]combine word document using python docx

I have few word files that each have specific content.我有几个单词文件,每个文件都有特定的内容。 I would like for a snippet that show me or help me to figure out how to combine the word files into one file, while using Python docx library.我想要一个片段来展示或帮助我弄清楚如何在使用 Python docx库的同时将单词文件合并到一个文件中。

For example in pywin32 library I did the following:例如在 pywin32 库中,我执行了以下操作:

rng = self.doc.Range(0, 0)
for d in data:
    time.sleep(0.05)

    docstart = d.wordDoc.Content.Start
    self.word.Visible = True
    docend = d.wordDoc.Content.End - 1
    location = d.wordDoc.Range(docstart, docend).Copy()
    rng.Paste()
    rng.Collapse(0)
    rng.InsertBreak(win32.constants.wdPageBreak)

But I need to do it while using Python docx library instead of win32.client但是我需要在使用 Python docx库而不是win32.client

The alternative approach to merge two documents including all the styles is to use python library docxcompose ( https://pypi.org/project/docxcompose/ ) .合并包含所有样式的两个文档的另一种方法是使用 python 库 docxcompose ( https://pypi.org/project/docxcompose/ )。 We do not need to explicitly define the styling and we do not have to read the document paragraph by paragraph and append it to the master document.我们不需要明确定义样式,也不必逐段阅读文档并将其附加到主文档。 The usage of the python docxcompose is shown in the below code python docxcompose的使用如下代码所示

#Importing the required packages

from docxcompose.composer import Composer
from docx import Document as Document_compose
#filename_master is name of the file you want to merge the docx file into
master = Document_compose(filename_master)

composer = Composer(master)
#filename_second_docx is the name of the second docx file
doc2 = Document_compose(filename_second_docx)
#append the doc2 into the master using composer.append function
composer.append(doc2)
#Save the combined docx with a name
composer.save("combined.docx")

If you want to merge multiple documents into one docx file you can use the below function如果要将多个文档合并为一个 docx 文件,可以使用以下功能


#Filename_master is the name of the file you want to merge all the document into
#files_list is a list containing all the filename of the docx file to be merged
def combine_all_docx(filename_master,files_list):
    number_of_sections=len(files_list)
    master = Document_compose(filename_master)
    composer = Composer(master)
    for i in range(0, number_of_sections):
        doc_temp = Document_compose(files_list[i])
        composer.append(doc_temp)
    composer.save("combined_file.docx")
#For Example
#filename_master="file1.docx"
#files_list=["file2.docx","file3.docx","file4.docx",file5.docx"]
#Calling the function
#combine_all_docx(filename_master,files_list)
#This function will combine all the document in the array files_list into the file1.docx and save the merged document into combined_file.docx

I've adjusted the example above to work with the latest version of python-docx (0.8.6 at the time of writing).我已经调整了上面的示例以使用最新版本的python-docx (在撰写本文时为 0.8.6)。 Note that this just copies the elements (merging styles of elements is more complicated to do):请注意,这只是复制元素(合并元素的样式更复杂):

from docx import Document

files = ['file1.docx', 'file2.docx']

def combine_word_documents(files):
    merged_document = Document()

    for index, file in enumerate(files):
        sub_doc = Document(file)

        # Don't add a page break if you've reached the last file.
        if index < len(files)-1:
           sub_doc.add_page_break()

        for element in sub_doc.element.body:
            merged_document.element.body.append(element)

    merged_document.save('merged.docx')

combine_word_documents(files)

If your needs are simple, something like this might work:如果您的需求很简单,这样的事情可能会奏效:

source_document = Document('source.docx')
target_document = Document()

for paragraph in source_document.paragraphs:
    text = paragraph.text
    target_document.add_paragraph(text)

There are additional things you can do, but that should get you started.您还可以执行其他操作,但这应该可以帮助您入门。

It turns out that copying content from one Word file to another is quite complex in the general case, involving things like reconciling styles present in the source document that may be conflicting in the target document for example.事实证明,在一般情况下,将内容从一个 Word 文件复制到另一个文件是相当复杂的,例如涉及协调源文档中存在的样式,而目标文档中可能存在冲突。 So it's not a feature we're likely to be adding in the next year, say.所以这不是我们可能在明年添加的功能,比如说。

Create an empty document (empty.docx) and add your two documents to this.创建一个空文档 (empty.docx) 并将您的两个文档添加到其中。 On each loop of the iteration over the files, add a page break if necessary.在对文件进行迭代的每个循环中,如有必要,添加一个分页符。

On completion save the new file that contains your two combined files.完成后保存包含两个合并文件的新文件。

from docx import Document

files = ['file1.docx', 'file2.docx']

def combine_word_documents(files):
    combined_document = Document('empty.docx')
    count, number_of_files = 0, len(files)
    for file in files:
        sub_doc = Document(file)

        # Don't add a page break if you've
        # reached the last file.
        if count < number_of_files - 1:
            sub_doc.add_page_break()

        for element in sub_doc._document_part.body._element:
            combined_document._document_part.body._element.append(element)
        count += 1

    combined_document.save('combined_word_documents.docx')

combine_word_documents(files)

If you just need to combine simple documents with only text, you can use python-docx as mentioned above.如果你只需要将简单的文档与只有文本结合起来,你可以使用上面提到的python-docx。

If you need to merge documents containing hyperlinks, images, lists, bullet points etc. You can do this using lxml to combine the document body and all the reference files, like:如果您需要合并包含超链接、图像、列表、项目符号等的文档。您可以使用 lxml 来合并文档正文和所有参考文件,例如:

  • word/styles.xml字/样式.xml
  • word/numbering.xml字/编号.xml
  • word/media文字/媒体
  • [Content_Types].xml [Content_Types].xml

This is all very useful.这都是非常有用的。 I combined the answers of Martijn Jacobs and Mr Kriss.我结合了 Martijn Jacobs 和 Mr Kriss 的回答。

def combine_word_documents(input_files):
    """
    :param input_files: an iterable with full paths to docs
    :return: a Document object with the merged files
    """
    for filnr, file in enumerate(input_files):
        # in my case the docx templates are in a FileField of Django, add the MEDIA_ROOT, discard the next 2 lines if not appropriate for you. 
        if 'offerte_template' in file:
            file = os.path.join(settings.MEDIA_ROOT, file)

        if filnr == 0:
            merged_document = Document(file)
            merged_document.add_page_break()

        else:
            sub_doc = Document(file)

            # Don't add a page break if you've reached the last file.
            if filnr < len(input_files)-1:
                sub_doc.add_page_break()

            for element in sub_doc.element.body:
                merged_document.element.body.append(element)

    return merged_document

Another alternative solution is Aspose.Words Cloud SDK for Python .另一种替代解决方案是Aspose.Words Cloud SDK for Python It retains the formatting/style of the documents based on ImportFormatMode parameter.它保留基于 ImportFormatMode 参数的文档格式/样式。 The parameter defines which formatting will be used: appended or destination document.该参数定义将使用哪种格式:附加或目标文档。 Possible values are KeepSourceFormatting or UseDestinationStyles.可能的值为 KeepSourceFormatting 或 UseDestinationStyles。

# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile


# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'

words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'


remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = 'destination.docx'
remoteFileName = 'destination.docx'
localFileName1 = 'source.docx'
remoteFileName1 = 'source.docx'

#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName1,'rb'),remoteFolder + '/' + remoteFileName1))

#append Word documents
requestDocumentListDocumentEntries0 = asposewordscloud.DocumentEntry(href=remoteFolder + '/' + remoteFileName1, import_format_mode='KeepSourceFormatting')

requestDocumentListDocumentEntries = [requestDocumentListDocumentEntries0]
requestDocumentList = asposewordscloud.DocumentEntryList(document_entries=requestDocumentListDocumentEntries)
request = asposewordscloud.models.requests.AppendDocumentRequest(name=remoteFileName, document_list=requestDocumentList, folder=remoteFolder, dest_file_name= remoteFolder + '/' + remoteFileName)

result = words_api.append_document(request)

#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(remoteFolder + '/' + remoteFileName)
response_download = words_api.download_file(request_download)
copyfile(response_download, localFolder + '/' +"Append_output.docx")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM