![](/img/trans.png)
[英]How to increment paragraph object in word document using python-docx?
[英]combine word document using python docx
我有幾個單詞文件,每個文件都有特定的內容。 我想要一個片段來展示或幫助我弄清楚如何在使用 Python docx
庫的同時將單詞文件合並到一個文件中。
例如在 pywin32 庫中,我執行了以下操作:
rng = self.doc.Range(0, 0)
for d in data:
time.sleep(0.05)
docstart = d.wordDoc.Content.Start
self.word.Visible = True
docend = d.wordDoc.Content.End - 1
location = d.wordDoc.Range(docstart, docend).Copy()
rng.Paste()
rng.Collapse(0)
rng.InsertBreak(win32.constants.wdPageBreak)
但是我需要在使用 Python docx
庫而不是win32.client
合並包含所有樣式的兩個文檔的另一種方法是使用 python 庫 docxcompose ( https://pypi.org/project/docxcompose/ )。 我們不需要明確定義樣式,也不必逐段閱讀文檔並將其附加到主文檔。 python docxcompose的使用如下代碼所示
#Importing the required packages
from docxcompose.composer import Composer
from docx import Document as Document_compose
#filename_master is name of the file you want to merge the docx file into
master = Document_compose(filename_master)
composer = Composer(master)
#filename_second_docx is the name of the second docx file
doc2 = Document_compose(filename_second_docx)
#append the doc2 into the master using composer.append function
composer.append(doc2)
#Save the combined docx with a name
composer.save("combined.docx")
如果要將多個文檔合並為一個 docx 文件,可以使用以下功能
#Filename_master is the name of the file you want to merge all the document into
#files_list is a list containing all the filename of the docx file to be merged
def combine_all_docx(filename_master,files_list):
number_of_sections=len(files_list)
master = Document_compose(filename_master)
composer = Composer(master)
for i in range(0, number_of_sections):
doc_temp = Document_compose(files_list[i])
composer.append(doc_temp)
composer.save("combined_file.docx")
#For Example
#filename_master="file1.docx"
#files_list=["file2.docx","file3.docx","file4.docx",file5.docx"]
#Calling the function
#combine_all_docx(filename_master,files_list)
#This function will combine all the document in the array files_list into the file1.docx and save the merged document into combined_file.docx
我已經調整了上面的示例以使用最新版本的python-docx (在撰寫本文時為 0.8.6)。 請注意,這只是復制元素(合並元素的樣式更復雜):
from docx import Document
files = ['file1.docx', 'file2.docx']
def combine_word_documents(files):
merged_document = Document()
for index, file in enumerate(files):
sub_doc = Document(file)
# Don't add a page break if you've reached the last file.
if index < len(files)-1:
sub_doc.add_page_break()
for element in sub_doc.element.body:
merged_document.element.body.append(element)
merged_document.save('merged.docx')
combine_word_documents(files)
如果您的需求很簡單,這樣的事情可能會奏效:
source_document = Document('source.docx')
target_document = Document()
for paragraph in source_document.paragraphs:
text = paragraph.text
target_document.add_paragraph(text)
您還可以執行其他操作,但這應該可以幫助您入門。
事實證明,在一般情況下,將內容從一個 Word 文件復制到另一個文件是相當復雜的,例如涉及協調源文檔中存在的樣式,而目標文檔中可能存在沖突。 所以這不是我們可能在明年添加的功能,比如說。
創建一個空文檔 (empty.docx) 並將您的兩個文檔添加到其中。 在對文件進行迭代的每個循環中,如有必要,添加一個分頁符。
完成后保存包含兩個合並文件的新文件。
from docx import Document
files = ['file1.docx', 'file2.docx']
def combine_word_documents(files):
combined_document = Document('empty.docx')
count, number_of_files = 0, len(files)
for file in files:
sub_doc = Document(file)
# Don't add a page break if you've
# reached the last file.
if count < number_of_files - 1:
sub_doc.add_page_break()
for element in sub_doc._document_part.body._element:
combined_document._document_part.body._element.append(element)
count += 1
combined_document.save('combined_word_documents.docx')
combine_word_documents(files)
如果你只需要將簡單的文檔與只有文本結合起來,你可以使用上面提到的python-docx。
如果您需要合並包含超鏈接、圖像、列表、項目符號等的文檔。您可以使用 lxml 來合並文檔正文和所有參考文件,例如:
這都是非常有用的。 我結合了 Martijn Jacobs 和 Mr Kriss 的回答。
def combine_word_documents(input_files):
"""
:param input_files: an iterable with full paths to docs
:return: a Document object with the merged files
"""
for filnr, file in enumerate(input_files):
# in my case the docx templates are in a FileField of Django, add the MEDIA_ROOT, discard the next 2 lines if not appropriate for you.
if 'offerte_template' in file:
file = os.path.join(settings.MEDIA_ROOT, file)
if filnr == 0:
merged_document = Document(file)
merged_document.add_page_break()
else:
sub_doc = Document(file)
# Don't add a page break if you've reached the last file.
if filnr < len(input_files)-1:
sub_doc.add_page_break()
for element in sub_doc.element.body:
merged_document.element.body.append(element)
return merged_document
另一種替代解決方案是Aspose.Words Cloud SDK for Python 。 它保留基於 ImportFormatMode 參數的文檔格式/樣式。 該參數定義將使用哪種格式:附加或目標文檔。 可能的值為 KeepSourceFormatting 或 UseDestinationStyles。
# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
import os
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile
# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'
remoteFolder = 'Temp'
localFolder = 'C:/Temp'
localFileName = 'destination.docx'
remoteFileName = 'destination.docx'
localFileName1 = 'source.docx'
remoteFileName1 = 'source.docx'
#upload file
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName,'rb'),remoteFolder + '/' + remoteFileName))
words_api.upload_file(asposewordscloud.models.requests.UploadFileRequest(open(localFolder + '/' + localFileName1,'rb'),remoteFolder + '/' + remoteFileName1))
#append Word documents
requestDocumentListDocumentEntries0 = asposewordscloud.DocumentEntry(href=remoteFolder + '/' + remoteFileName1, import_format_mode='KeepSourceFormatting')
requestDocumentListDocumentEntries = [requestDocumentListDocumentEntries0]
requestDocumentList = asposewordscloud.DocumentEntryList(document_entries=requestDocumentListDocumentEntries)
request = asposewordscloud.models.requests.AppendDocumentRequest(name=remoteFileName, document_list=requestDocumentList, folder=remoteFolder, dest_file_name= remoteFolder + '/' + remoteFileName)
result = words_api.append_document(request)
#download file
request_download=asposewordscloud.models.requests.DownloadFileRequest(remoteFolder + '/' + remoteFileName)
response_download = words_api.download_file(request_download)
copyfile(response_download, localFolder + '/' +"Append_output.docx")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.