简体   繁体   English

在Python中编辑XML文件中的项目

[英]Editing Items in an XML File in Python

I'm trying to take data from a .csv file and create individual .xml files for each row. 我正在尝试从.csv文件中获取数据,并为每一行创建单独的.xml文件。 I've read the .csv into Pandas already. 我已经将.csv读入了Pandas。 Where I'm struggling is trying to figure out how to make edits in .xml files. 我在努力的地方试图弄清楚如何在.xml文件中进行编辑。

I'm using this previous answer as a guide to try to learn this: 我将之前的答案用作指导来尝试学习以下内容:

Link 链接

Applying the author's solution to my data would look something like this: 将作者的解决方案应用于我的数据将如下所示:

data = """<annotation>
    <folder>VOC2007</folder>
    <filename>abc.jpg</filename>
    <object>
        <name>blah</name>
        <pose>unknown</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>0</xmin>
            <ymin>0</ymin>
            <xmax>0</xmax>
            <ymax>0</ymax>
        </bndbox>
    </object>
</annotation>
"""

Then I do this: 然后我这样做:

tree = et.fromstring(data)

Where I'm stuck is the next part. 下一部分是我遇到的问题。 The author edits their file with this line of code: 作者使用以下代码行编辑他们的文件:

for data in tree.findall("data"):
    name = data.attrib["name"]
    value = data.find("value")
    value.text = "[%s] %s" % (name, value.text)

I try to apply it to my own like this: 我尝试将其应用于我自己的对象,如下所示:

for data in tree.findall("data"):  
    filename = data.find("filename")
    filename.text = "001.jpg"

But this doesn't seem to change anything when I print it out. 但是当我打印出来时,这似乎并没有改变任何东西。

print(et.tostring(tree))

What am I doing wrong or what steps do I need to take to edit the name of the image from 'abc.jpg' to '001.jpg'? 我在做什么错或者需要采取什么步骤将图像名称从“ abc.jpg”编辑为“ 001.jpg”?

Also trying to figure out how to change the values for the four items xmin, ymin, xmax, and ymax. 还试图弄清楚如何更改四个项xmin,ymin,xmax和ymax的值。

I make the assumption you read your CSV file and extract a collection of dictionary-like records, for instance: 我假设您阅读了CSV文件并提取了类似字典的记录的集合,例如:

record = {
    'folder': "VOC2007",
    'filename': "abc.jpg",
    'name': "blah",
    'pose': "unknown",
    'truncated': "0",
    'difficult': "0",
    'xmin': "0",
    'ymin': "0",
    'xmax': "0",
    'ymax': "0",
}

A simple thing you can do is to use a string template to generate your XML content (since it is very simple): 您可以做的一件简单的事情是使用字符串模板来生成XML内容(因为它非常简单):

import textwrap

template = textwrap.dedent("""\
<annotation>
    <folder>{folder}</folder>
    <filename>{filename}</filename>
    <object>
        <name>{name}</name>
        <pose>{pose}</pose>
        <truncated>{truncated}</truncated>
        <difficult>{difficult}</difficult>
        <bndbox>
            <xmin>{xmin}</xmin>
            <ymin>{ymin}</ymin>
            <xmax>{xmax}</xmax>
            <ymax>{ymax}</ymax>
        </bndbox>
    </object>
</annotation>""")

To generate your XML content you can do: 要生成XML内容,您可以执行以下操作:

from xml.sax.saxutils import escape

escaped = {k: escape(v) for k, v in record.items()}
data = template.format(**escaped)

The function xml.sax.saxutils.escape is used to convert “<“, “>” and “&” into XML entities. 函数xml.sax.saxutils.escape用于将“ <”,“>”和“&”转换为XML实体。

The result is: 结果是:

<annotation>
    <folder>VOC2007</folder>
    <filename>abc.jpg</filename>
    <object>
        <name>blah</name>
        <pose>unknown</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>0</xmin>
            <ymin>0</ymin>
            <xmax>0</xmax>
            <ymax>0</ymax>
        </bndbox>
    </object>
</annotation>

My preference lies in using xmltodict . 我的偏好在于使用xmltodict But from the link you have posted, it seems you are wanting to make the .find("filename") from within the tag and not a tag (which isn't present in your xml-data as is also stated in a comment). 但是从您发布的链接来看,您似乎想从标记而不是标记中创建.find(“ filename”)(标记中也没有出现在标记中的标记) 。

That is, your code could be changed "minimally" (I don't know ElementTree well enough to say what the best solution is) to something like: 也就是说,您的代码可以“最小”更改(例如,我不太了解ElementTree,无法说出最佳解决方案是什么),例如:

for annotation in tree.findall("annotation")
    filename = annotation.find("filename")
    filename.text = "001.jpg"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM