简体   繁体   中英

Using python-docx, how can I associate an XML namespace prefix?

I am currently trying to add a checkbox to a word document using the python-docx library. I've narrowed down the checkbox XML to two possible options, the first from evaluating the word/document.xml file from an existing doc and the second from the XML Schema. When trying to insert the XML element into a simple document I receive the error; "lxml.etree.XMLSyntaxError: Namespace prefix xsd on complexType is not defined".

This is what I'm currently trying (using XML from the Schema):

def word_docs(emails):
    cbox = parse_xml('<xsd:complexType name="CT_FFCheckBox"><xsd:sequence>  \
                <xsd:choice><xsd:element name="size"type="CT_HpsMeasure"/>  \
                <xsd:element name="sizeAuto" type="CT_OnOff"/></xsd:choice> \
                <xsd:element name="default" type="CT_OnOff" minOccurs="0"/> \
                <xsd:element name="checked" type="CT_OnOff" minOccurs="0"/> \
                </xsd:sequence></xsd:complexType>')

    doc = Document()
    title = doc.add_heading("Document", 0)
    table = doc.add_table(rows = 1, cols = 4)
    table.style = 'TableGrid'

    row = table.rows[0]
    row.cells[0].text = "Test"

    merged = (row.cells[1].merge(row.cells[2]))
    merged._tc._add_p()
    ....

The following is the XML pulled from an existing document:

<w:tc>
<w:tcPr>
    <w:tcW w:w="4788" w:type="dxa"/>
</w:tcPr>
<w:p wsp:rsidR="00834643" wsp:rsidRPr="00834643" wsp:rsidRDefault="00F12FD5" wsp:rsidP="00834643">
    <w:pPr>
        <w:spacing w:after="0" w:line="240" w:line-rule="auto"/>
    </w:pPr>
    <w:r>
        <w:fldChar w:fldCharType="begin">
            <w:fldData xml:space="preserve">/////2UAAAAUAAYAQwBoAGUAYwBrADEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</w:fldData>
        </w:fldChar>
    </w:r>
    <aml:annotation aml:id="1" w:type="Word.Bookmark.Start" w:name="Check2"/>
        <w:r>
            <w:instrText> FORMCHECKBOX </w:instrText>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="end"/>
        </w:r>
    <aml:annotation aml:id="1" w:type="Word.Bookmark.End"/>
</w:p>

I've been able to manually add the namespace xmlns:xsd="http://www.w3.org/2001/XMLSchema" manually to a document and it seems to open correctly, I am just unsure of how to do this in a pythonic way to automate the process. The XML object manipulation through python-docx may be incorrect, but it is what makes sense to me after comparing the XML format and the python-docx objects and the way they are handled - I haven't been able to test it with this error.

Any help is appreciated! Thanks!

Ah, okay, your comment explains it. The MS Word 2003 XML format is not the same as the MS Word 2007 format (which, by the way, is inherently XML and requires no conversion).

To view the XML of a Word 2007 or later .docx file, you simply unzip it (it is a Zip archive). You may need to add a .zip extension first, depending on what tools you use for the unzipping. You'll be interested in the XML in the document.xml file in the resulting tree. I think you'll find that the bookmark appears as a <w:bookmarkStart> and <w:bookmarkEnd> element pair, which will not require any additions to the built-in namespaces of python-docx .

See this GitHub issue for more details and an example: github.com/python-openxml/python-docx/issues/403.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM