重写xml并保存上下文

Question

I've XML strings like the following: 我有如下的XML字符串：

xml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""

I've some function called processLine(line) that takes a whole line ( text within <head> without tags), in my case these two lines will be processed by the processLine function: 我有一个名为processLine(line)函数，该函数占用整行（ <head>文本，不带标签），在我的情况下，这两行将由processLine函数处理：

1. Un livre sur Ronsard
2. La pitié des églises par Barrès

and concatenate a certain string to some words of each line, for example: 并将某个字符串连接到每一行的某些单词，例如：

"Ronsard" becomes "Ronsard I-PER"
"Barrès"  becomes "Barrès I-PER"

Here is the code I've made so far using Python's etree library: 到目前为止，这是我使用Python的etree库编写的代码：

from lxml import etree

root = etree.fromstring(xml)
pars = root.xpath('//body//head')

for par in pars:
    line = par.text # return the line stripped from tags
    processLine( line )

My Question: How can I save those changes in the xml file, without loosing its structure ? 我的问题：如何在不丢失其结构的情况下将这些更改保存在xml文件中？

ie: My new XML file in my exemple will become: 即：我的示例中的新XML文件将变为：

newxml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
"""

Answer 1

You can set the tag' text property to what you need and then just call etree.tostring(rootElt, prettyPrint = True) . 您可以将标签的text属性设置为所需的属性，然后只需调用etree.tostring(rootElt, prettyPrint = True) 。

Yeah, and note: I'm selecting all the <persName> tags, not all the headings itselves: 是的，请注意：我选择了所有<persName>标记，而不是所有标题。

pars = root.xpath('//body//head//persName')

Check this out: 看一下这个：

from lxml import etree

xml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""

root = etree.fromstring(xml)
pars = root.xpath('//body//head//persName')

for par in pars:
    line = par.text # return the line stripped from tags
    processLine( line ) 

    par.text = par.text + ' I-PER'

print(etree.tostring(root, unicode = True, pretty_print = True))

This prints the following XML: 这将打印以下XML：

<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>

If you want to process all the headings and only then process names - may be you want to select inner tag ( persName ) from heading tag itself ( head )? 如果您只想处理所有标题，然后再处理名称，那么是否可能要从标题标签本身（ head ）中选择内部标签（ persName ）？

for par in pars:
    # ...

    pers = par.xpath('//persName')

    for per in pers:
        per.text = per.text + ' I-PER'

This code gives exactly the same result, but within the processLine function you will still deal with the whole <head> tag, whilst pers variable will contain all that tag's <persName> children. 这段代码给出了完全相同的结果，但是在processLine函数中，您仍将处理整个<head>标记，而pers变量将包含该标记的所有<persName>子代。

重写xml并保存上下文

问题描述

1 个解决方案

解决方案1
1 2015-03-31 22:04:08

重写xml并保存上下文

问题描述

1 个解决方案

解决方案1 1 2015-03-31 22:04:08

解决方案1
1 2015-03-31 22:04:08