I've XML strings like the following:
xml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""
I've some function called processLine(line)
that takes a whole line ( text within <head>
without tags), in my case these two lines will be processed by the processLine
function:
1. Un livre sur Ronsard
2. La pitié des églises par Barrès
and concatenate a certain string to some words of each line, for example:
"Ronsard" becomes "Ronsard I-PER"
"Barrès" becomes "Barrès I-PER"
Here is the code I've made so far using Python's etree library:
from lxml import etree
root = etree.fromstring(xml)
pars = root.xpath('//body//head')
for par in pars:
line = par.text # return the line stripped from tags
processLine( line )
My Question: How can I save those changes in the xml file, without loosing its structure ?
ie: My new XML file in my exemple will become:
newxml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
"""
You can set the tag' text
property to what you need and then just call etree.tostring(rootElt, prettyPrint = True)
.
Yeah, and note: I'm selecting all the <persName>
tags, not all the headings itselves:
pars = root.xpath('//body//head//persName')
Check this out:
from lxml import etree
xml = """
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès</persName></head>
</body>
"""
root = etree.fromstring(xml)
pars = root.xpath('//body//head//persName')
for par in pars:
line = par.text # return the line stripped from tags
processLine( line )
par.text = par.text + ' I-PER'
print(etree.tostring(root, unicode = True, pretty_print = True))
This prints the following XML:
<body>
<head>1. Un livre sur <persName type="author" key="Ronsard, Pierre de (1524-1585)" ref="http://www.idref.fr/027107957">Ronsard I-PER</persName></head>
<head>2. <title>La pitié des églises</title> par <persName key="Barrès, Maurice (1862-1923)" ref="http://www.idref.fr/026706601" type="author">Barrès I-PER</persName></head>
</body>
If you want to process all the headings and only then process names - may be you want to select inner tag ( persName
) from heading tag itself ( head
)?
for par in pars:
# ...
pers = par.xpath('//persName')
for per in pers:
per.text = per.text + ' I-PER'
This code gives exactly the same result, but within the processLine
function you will still deal with the whole <head>
tag, whilst pers
variable will contain all that tag's <persName>
children.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.