[英]Rewrite xml and save context
I've XML strings like the following: 我有如下的XML字符串:
xml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""
I've some function called processLine(line)
that takes a whole line ( text within <head>
without tags), in my case these two lines will be processed by the processLine
function: 我有一个名为
processLine(line)
函数,该函数占用整行( <head>
文本,不带标签),在我的情况下,这两行将由processLine
函数处理:
1. Un livre sur Ronsard
2. La pitié des églises par Barrès
and concatenate a certain string to some words of each line, for example: 并将某个字符串连接到每一行的某些单词,例如:
"Ronsard" becomes "Ronsard I-PER"
"Barrès" becomes "Barrès I-PER"
Here is the code I've made so far using Python's etree library: 到目前为止,这是我使用Python的etree库编写的代码:
from lxml import etree
root = etree.fromstring(xml)
pars = root.xpath('//body//head')
for par in pars:
line = par.text # return the line stripped from tags
processLine( line )
My Question: How can I save those changes in the xml file, without loosing its structure ? 我的问题:如何在不丢失其结构的情况下将这些更改保存在xml文件中?
ie: My new XML file in my exemple will become: 即:我的示例中的新XML文件将变为:
newxml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
"""
You can set the tag' text
property to what you need and then just call etree.tostring(rootElt, prettyPrint = True)
. 您可以将标签的
text
属性设置为所需的属性,然后只需调用etree.tostring(rootElt, prettyPrint = True)
。
Yeah, and note: I'm selecting all the <persName>
tags, not all the headings itselves: 是的,请注意:我选择了所有
<persName>
标记,而不是所有标题。
pars = root.xpath('//body//head//persName')
Check this out: 看一下这个:
from lxml import etree
xml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""
root = etree.fromstring(xml)
pars = root.xpath('//body//head//persName')
for par in pars:
line = par.text # return the line stripped from tags
processLine( line )
par.text = par.text + ' I-PER'
print(etree.tostring(root, unicode = True, pretty_print = True))
This prints the following XML: 这将打印以下XML:
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
If you want to process all the headings and only then process names - may be you want to select inner tag ( persName
) from heading tag itself ( head
)? 如果您只想处理所有标题,然后再处理名称,那么是否可能要从标题标签本身(
head
)中选择内部标签( persName
)?
for par in pars:
# ...
pers = par.xpath('//persName')
for per in pers:
per.text = per.text + ' I-PER'
This code gives exactly the same result, but within the processLine
function you will still deal with the whole <head>
tag, whilst pers
variable will contain all that tag's <persName>
children. 这段代码给出了完全相同的结果,但是在
processLine
函数中,您仍将处理整个<head>
标记,而pers
变量将包含该标记的所有<persName>
子代。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.