简体   繁体   English

Python 在 Words 文档中查找替换字符串并在字符串之前创建两个段落

[英]Python to Find-replace a string and Create Two Paragraphs Before String in Words Document

I have a VBA Macro.我有一个 VBA 宏。 In that, I have在那,我有

.Find Text = 'Pollution'
.Replacement Text = '^p^pChemical'

Here, '^p^pChemical' means Replace the Word Pollution with Chemical and create two empty paragraphs before the word sea.在这里, '^p^pChemical'意思是用化学代替污染这个词,并在海这个词之前创建两个空段落。

Before:前:

在此处输入图像描述

After:后:

Have you noticed that The Word Pollution has been replaced With Chemical and two empty paragraphs preceds it ?您是否注意到“污染”一词已被替换为“化学”,并且在其前面有两个空段落? This is how I want in Python.这就是我在 Python 中想要的方式。

在此处输入图像描述

My Code so far:到目前为止我的代码:

import docx
from docx import Document
    document = Document('Example.docx')
    for Paragraph in document.paragraphs:
        if 'Pollution' in paragraph:
             replace(Pollution, Chemical)
        document.add_paragraph(before('Chemical'))
        document.add_paragraph(before('Chemical'))

I want to open a word document to find the word, replace it with another word, and create two empty paragraphs before the replaced word.我想打开一个word文档找到这个词,用另一个词替换它,并在替换的词之前创建两个空段落。

This will take the text from the your document, replace the instances of the word pollution with chemical and add paragraphs in between, but it doesn't change the first document, instead it creates a copy.这将从您的文档中获取文本,将单词污染的实例替换为化学并在其间添加段落,但它不会更改第一个文档,而是创建一个副本。 This is probably the safer route to go anyway.无论如何,这可能是更安全的路线。

import re
from docx import Document

ref = {"Pollution":"Chemicals", "Ocean":"Sea", "Speaker":"Magnet"}

def get_old_text():
    doc1 = Document('demo.docx')
    fullText = []
    for para in doc1.paragraphs:
        fullText.append(para.text)
    text = '\n'.join(fullText)
    return text


def create_new_document(ref, text):
    doc2 = Document()
    lines = text.split('\n')
    for line in lines:
        for k in ref:
            if k.lower() in line.lower():
                parts = re.split(f'{k}', line, flags=re.I)
                doc2.add_paragraph(parts[0])
                for part in parts[1:]:
                    doc2.add_paragraph('')
                    doc2.add_paragraph('')
                    doc2.add_paragraph(ref[k] + " " + part)
    doc2.save('demo.docx')


text = get_old_text()
create_new_document(ref, text)

You can search through each paragraph to find the word of interest, and call insert_paragraph_before to add the new elements:您可以搜索每个段落以找到感兴趣的单词,并调用insert_paragraph_before添加新元素:

def replace(doc, target, replacement):
   for par in list(document.paragraphs):
        text = par.text
        while (index := text.find(target)) != -1:
            par.insert_paragraph_before(text[:index].rstrip())
            par.insert_paragraph_before('')
            par.text = replacement + text[index + len(target)]

list(doc.paragraphs) makes a copy of the list, so that the iteration is not thrown off when you insert elements. list(doc.paragraphs)制作列表的副本,以便在插入元素时不会丢弃迭代。

Call this function as many times as you need to replace whatever words you have.根据需要多次调用此函数来替换您拥有的任何单词。

You need to use \n for new line.您需要使用\n换行。 Using re should work like so:使用re应该像这样工作:

import re

before = "The term Pollution means the manifestation of any unsolicited foregin substance in something. When we talk about pollution on earth, we refer to the contamination that is happening of the natural resources by various pollutants"
pattern = re.compile("pollution", re.IGNORECASE)
after = pattern.sub("\n\nChemical", before)
print(after)

Which will output:这将输出:

The term 

Chemical means the manifestation of any unsolicited foregin substance in something. When we talk about 

Chemical on earth, we refer to the contamination that is happening of the natural resources by various pollutants

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM