简体   繁体   English

如何使用python-docx从word文档中删除分节符

[英]How to remove sectional breaks from word document using python-docx

I am trying to remove the sectional breaks from a word document.我正在尝试从 Word 文档中删除分节符。 For this I am trying to remove the sectPr attribute from the xml generated through python-docx.为此,我试图从通过 python-docx 生成的 xml 中删除 sectPr 属性。 This is the xml which is generated :这是生成的xml:

<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex wp14">
  <w:body>
    <w:p w14:paraId="0F1E22A8" w14:textId="1CB95B52" w:rsidR="006F7C29" w:rsidRDefault="00B46A6B">
      <w:pPr>
        <w:sectPr w:rsidR="006F7C29">
          <w:pgSz w:w="11906" w:h="16838"/>
          <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
          <w:cols w:space="708"/>
          <w:docGrid w:linePitch="360"/>
        </w:sectPr>
      </w:pPr>
      <w:r>
        <w:t>math</w:t>
      </w:r>
    </w:p>
    <w:p w14:paraId="3FE55637" w14:textId="789D24FC" w:rsidR="003660CC" w:rsidRPr="003660CC" w:rsidRDefault="003660CC" w:rsidP="008F17C5"/>
    <w:sectPr w:rsidR="003660CC" w:rsidRPr="003660CC" w:rsidSect="008F17C5">
      <w:type w:val="evenPage"/>
      <w:pgSz w:w="11906" w:h="16838"/>
      <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
      <w:cols w:space="708"/>
      <w:docGrid w:linePitch="360"/>
    </w:sectPr>
  </w:body>
</w:document>

I have written the following code to remove sectPr:我编写了以下代码来删除 sectPr:


def identifySbr(doc):
    allp=len(doc.paragraphs)
    document_xml = doc.element.xml


    for i in range(0,allp):
        c = doc.paragraphs[i]._p.xpath("./w:pPr/w:sectPr")

        if len(c)>0:
            ca = doc.paragraphs[i]._p.xpath("./w:pPr/w:sectPr")[0]
            ca.attrib.pop(qn("w:sectPr"))

But I am getting this error:但我收到此错误:

ca.attrib.pop(qn("w:sectPr"))
  File "src\lxml\etree.pyx", line 2449, in lxml.etree._Attrib.pop
KeyError: '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}sectPr'

can anybody please help me resolve this?有人可以帮我解决这个问题吗?

The <w:sectPr> item you are trying to remove is an element , not an attribute (of an element).您尝试删除的<w:sectPr>项目是一个元素,而不是(元素的)属性 So the error message is telling you that the w:sectPr element has no w:sectPr attribute, which of course it doesn't.所以错误信息告诉你w:sectPr元素没有w:sectPr属性,当然它没有。

I think what you're looking for is something like this:我认为你正在寻找的是这样的:

def remove_all_but_last_section(document):
    for paragraph in document.paragraphs:
        p = paragraph._p
        sectPrs = p.xpath("./w:pPr/w:sectPr")
        if not sectPrs:
            continue
        sectPr = sectPrs[0]
        sectPr.getparent().remove(sectPr)

An alternative implementation which is perhaps a bit more elegant and definitely would perform better (although it would probably be very fast either way unless the document was huge):另一种实现可能更优雅一点,并且肯定会表现得更好(尽管除非文档很大,否则无论哪种方式它都可能非常快):

def remove_all_but_last_section(document):
    sectPrs = document._element.xpath(".//w:pPr/w:sectPr")
    for sectPr in sectPrs:
        sectPr.getparent().remove(sectPr)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM