简体   繁体   中英

How do I use python-docx to find and bold certain words?

I am having trouble searching and bolding certain words from a dictionary. Right now this bolds the entire thing.

# Convert to .docx
with open(file, 'r', encoding='utf-8') as openfile:
    line = openfile.read()
    doc.add_paragraph(line)
    
    # Bold speaker names    
    for i in dictionary:
        for p in doc.paragraphs:
            if p.text.find(i) >= 0:
                p.text = p.text.replace(i, dictionary[i])
                p.style.font.bold = True #bolds entire thing not just dictionary name
    
# Save in current repository 
doc.save(fileName + ".docx")
os.system(fileName + ".docx")

You are adding bolding to your entire paragraph object. Your paragraph object consists of only one run because you used add_paragraph to add "line". You will need to break "line" down into substrings and add the substrings and the authors' names as separate runs in the order you want to the paragraph using add_run. When you add the run with an author's name you need to bold that run at the time you add it. See: [bolding python-docx documentation][1], [runs documentation][2], and [add_run documentation][3].

python-docx does not have a paragraph or run deleter so doing it this way leaves behind an empty paragraph. You should clean out the style on the paragraph as well. As your code illustrated, you were treating all the runs in the paragraph as a single text object so this is the solution I provided. All of the formatting on the runs in the original paragraph will be lost. A more complete solution would be to: apply the style of the old paragraph to the new paragraph and treat each run of the old paragraph individually, but you didn't ask that question.

import docx
from docx import Document

# create document
doc = Document()

#add a paragraph comprised of a single run
paraText = '''Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Vestibulum 
dictum a mauris quis posuere. Proin ac volutpat nulla, sit amet lacinia 
sapien. 
Aenean et lectus interdum, convallis nisi ultrices, dignissim dui. Donec in 
pretium mi. 
Vestibulum turpis lorem, convallis et nisl id, aliquam laoreet sem. 
 Suspendisse erat 
 justo, faucibus ut eleifend ac, dapibus a nisl. Donec nibh velit, lacinia 
 at vestibulum
 at, ultrices volutpat lorem. Sed eu diam odio. Ut feugiat, turpis eget 
tempus malesuada, 
libero neque venenatis mi, id vestibulum lectus felis sit amet quam.'''
paragraph = doc.add_paragraph(paraText, style=None)

#Find your string
string2Bold = 'volutpat nulla'
lenString = len(string2Bold)

for oldPara in doc.paragraphs:
    if oldPara.text.find(string2Bold) >= 0:
        #insert, not add, empty paragraph
        newPara = oldPara.insert_paragraph_before(text=None, style=None)
        #get all the text from the old paragraph
        paraText = oldPara.text

        #determine sections of text before and after text to bold
        index = paraText.index(string2Bold)
        before = paraText[0:index]
        after = paraText[index + lenString:]

        #reconstruct the paragraph as three runs
        newPara.add_run(before)
        #add your bolded text
        run = newPara.add_run(string2Bold)
        run.bold = True
        newPara.add_run(after)

    #clear out old paragraph, there is no delete or remove that I can find
    #in python-docx, leaves behind an empty paragraph with a style attached
    oldPara.clear()

doc.save('boldTest.docx')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM