简体   繁体   English

如何从python docx中删除粗体字

[英]how to remove bold words from python docx

I have a docx that I need to preprocess using spaCy. 我有一个需要使用spaCy进行预处理的docx。 I need to remove all words that appear in bold in the document. 我需要删除文档中以粗体显示的所有单词。

I tried the following: 我尝试了以下方法:

def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None


length = len(document.paragraphs)
for i in range(0,length):
  for j in range(0,len(document.paragraphs[i].runs)):
     if document.paragraphs[i].runs[j].bold == True:
        delete_paragraph(document.paragraphs[i])
        length = length-1
        continue
document.save("/home/nikita/Desktop/Internship/new topic_mod/AXIS new.docx")

But I get the following error: 但是我收到以下错误:

IndexError: Traceback (most recent call last)
<ipython-input-12-d144bd42e95e> in <module>()
  3     #print(document.paragraphs[i].text)
  4     for j in range(0,len(document.paragraphs[i].runs)):
----> 5         if document.paragraphs[i].runs[j].bold == True:
  6             delete_paragraph(document.paragraphs[i])
  7             length = length-1

IndexError: list index out of range

I cannot figure out why its out of range. 我不知道为什么它超出范围。 How can I remove bold words from a python-docx ? 如何从python-docx中删除粗体字?

Please help! 请帮忙!

There are a couple probable reasons: 有两个可能的原因:

  1. You need to break (not continue ) out of your inner loop once you've deleted the paragraph, otherwise you try to delete the same paragraph multiple times if it has more than one bold run. 删除段落后,您需要跳出(不continue )内部循环;否则,如果同一段落有多个粗体运行,则尝试多次删除该段落。

     for j in range(0,len(document.paragraphs[i].runs)): if document.paragraphs[i].runs[j].bold == True: delete_paragraph(document.paragraphs[i]) length = length-1 break 
  2. Your list of paragraphs is getting shorter each time you delete one, which changes the index of each paragraph that follows. 每次删除段落时,段落列表都会越来越短,这将更改随后的每个段落的索引。 If you traverse the paragraphs from bottom to top, that won't be a problem. 如果您从下至上遍历段落,那将不是问题。 Also, you can ditch all that (i, j) index management; 另外,您可以放弃所有(i,j)索引管理; Python rarely needs that. Python很少需要它。

     for paragraph in reversed(list(document.paragraphs)): for run in paragraph.runs: if run.bold: delete_paragraph(paragraph) break 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM