简体   繁体   English

重写xml并保存上下文

[英]Rewrite xml and save context

I've XML strings like the following: 我有如下的XML字符串:

xml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""

I've some function called processLine(line) that takes a whole line ( text within <head> without tags), in my case these two lines will be processed by the processLine function: 我有一个名为processLine(line)函数,该函数占用整行( <head>文本,不带标签),在我的情况下,这两行将由processLine函数处理:

1. Un livre sur Ronsard
2. La pitié des églises par Barrès

and concatenate a certain string to some words of each line, for example: 并将某个字符串连接到每一行的某些单词,例如:

"Ronsard" becomes "Ronsard I-PER"
"Barrès"  becomes "Barrès I-PER"

Here is the code I've made so far using Python's etree library: 到目前为止,这是我使用Python的etree库编写的代码:

from lxml import etree

root = etree.fromstring(xml)
pars = root.xpath('//body//head')

for par in pars:
    line = par.text # return the line stripped from tags
    processLine( line ) 

My Question: How can I save those changes in the xml file, without loosing its structure ? 我的问题:如何在不丢失其结构的情况下将这些更改保存在xml文件中?

ie: My new XML file in my exemple will become: 即:我的示例中的新XML文件将变为:

newxml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
"""

You can set the tag' text property to what you need and then just call etree.tostring(rootElt, prettyPrint = True) . 您可以将标签的text属性设置为所需的属性,然后只需调用etree.tostring(rootElt, prettyPrint = True)

Yeah, and note: I'm selecting all the <persName> tags, not all the headings itselves: 是的,请注意:我选择了所有<persName>标记,而不是所有标题。

pars = root.xpath('//body//head//persName')

Check this out: 看一下这个:

from lxml import etree

xml = """
<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""

root = etree.fromstring(xml)
pars = root.xpath('//body//head//persName')

for par in pars:
    line = par.text # return the line stripped from tags
    processLine( line ) 

    par.text = par.text + ' I-PER'

print(etree.tostring(root, unicode = True, pretty_print = True))

This prints the following XML: 这将打印以下XML:

<body>
    <head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
    <head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>

If you want to process all the headings and only then process names - may be you want to select inner tag ( persName ) from heading tag itself ( head )? 如果您只想处理所有标题,然后再处理名称,那么是否可能要从标题标签本身( head )中选择内部标签( persName )?

for par in pars:
    # ...

    pers = par.xpath('//persName')

    for per in pers:
        per.text = per.text + ' I-PER'

This code gives exactly the same result, but within the processLine function you will still deal with the whole <head> tag, whilst pers variable will contain all that tag's <persName> children. 这段代码给出了完全相同的结果,但是在processLine函数中,您仍将处理整个<head>标记,而pers变量将包含该标记的所有<persName>子代。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM