[英]Edit XML file text based on path
我有一個XML文件(例如jerry.xml),其中包含一些數據,如下所示。
<data>
<country name="Peru">
<rank updated="yes">2</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc month="06">141100</gdppc>
<gdpnp month="10">2.304e+0150</gdpnp>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc month="05">59900</gdppc>
<gdpnp month="08">1.9e-015</gdpnp>
<neighbor name="Malaysia" direction="N"/>
</country>
我使用下面的代碼從上面的xml中提取了一些選定文本的完整路徑。 原因在這篇文章中給出。
def extractNumbers(path, node):
nums = []
if 'month' in node.attrib:
if node.attrib['month'] in ['05', '06']:
return nums
path += '/' + node.tag
if 'name' in node.keys():
path += '=' + node.attrib['name']
elif 'year' in node.keys():
path += ' ' + 'month' + '=' + node.attrib['month']
try:
num = float(node.text)
nums.append( (path, num) )
except (ValueError, TypeError):
pass
for e in list(node):
nums.extend( extractNumbers(path, e) )
return nums
tree = ET.parse('jerry.xml')
nums = extractNumbers('', tree.getroot())
print len(nums)
print nums
這給了我需要改變的元素的位置,如下面csv的colomn 1所示(例如hrong.csv)。
Path Text1 Text2 Text3 Text4 Text5
'/data/country name=singapore/gdpnp month=08'; 5.2e-015; 2e-05; 8e-06; 9e-04; 0.4e-05;
'/data/country name=peru/gdppc month=06'; 0.04; 0.02; 0.15; 3.24; 0.98;
我想根據第1列中元素的位置,用上面hrong.csv的第2列中的元素替換原始XML文件(jerry.xml)元素的文本。
我是python的新手,並意識到我可能沒有使用最好的方法。 我很感激任何有關方向的幫助。 我基本上只需解析一些xml文件的選定文本節點,修改所選文本節點並保存每個文件。
謝謝
您應該能夠使用模塊的XPath功能來執行此操作:
import xml.etree.ElementTree as ET
tree = ET.parse('jerry.xml')
root = tree.getroot()
for data in root.findall(".//country[@name='singapore']/gdpnp[@month='08']"):
data.text = csv_value
tree.write("filename.xml")
因此,您需要重寫csv中的路徑以匹配為模塊定義的XPath規則(請參閱支持的XPath規則 )。
首先, 記錄如何修改XML 。 現在,這是我自己的例子:
import xml.etree.ElementTree as ET
s = """
<root>
<parent attribute="value">
<child_1 other_attr="other_value">child text</child_1>
<child_2 yet_another_attr="another_value">more child text</child_2>
</parent>
</root>
"""
root = ET.fromstring(s)
for parent in root.getchildren():
parent.attrib['attribute'] = 'new value'
for child in parent.getchildren():
child.attrib['new_attrib'] = 'new attribute for {}'.format(child.tag)
child.text += ', appended text!'
>>> ET.dump(root)
<root>
<parent attribute="new value">
<child_1 new_attrib="new attribute for child_1" other_attr="other_value">child text, appended text!</child_1>
<child_2 new_attrib="new attribute for child_2" yet_another_attr="another_value">more child text, appended text!</child_2>
</parent>
</root>
你也可以用Xpath做到這一點。
>>> root.find('parent/child_1[@other_attr]').attrib['other_attr'] = 'found it!'
>>> ET.dump(root)
<root>
<parent attribute="new value">
<child_1 new_attrib="new attribute for child_1" other_attr="found it!">child text, appended text!</child_1>
<child_2 new_attrib="new attribute for child_2" yet_another_attr="another_value">more child text, appended text!</child_2>
</parent>
</root>
我已經修改了extractNumbers函數和其他代碼,以根據讀入的文件生成相對xpath。
import xml.etree.ElementTree as ET
def extractNumbers(path, node):
nums = []
# You'll want to store a relative, rather than an absolute path.
if not path: # This is the root node, store the // Predicate to look at all root's children.
path = ".//"
else: # This is not the root node
if 'month' in node.attrib:
if node.attrib['month'] in ['05', '06']:
return nums
path += node.tag
if 'name' in node.keys():
path += '[@name="{:s}"]/'.format(node.attrib['name'])
elif 'year' in node.keys():
path += '[@month="{:s}"]/'.format(node.attrib['month'])
try:
num = float(node.text)
nums.append((path, num) )
except (ValueError, TypeError):
pass
# Descend into the node's child nodes
for e in list(node):
nums.extend( extractNumbers(path, e) )
return nums
tree = ET.parse('jerry.xml')
nums = extractNumbers('', tree.getroot())
此時,您有一個填充了“path,num”元組的nums列表。 您將要將路徑寫入csv。 在下文中,我假設您事先知道Text1,Text2和Text3值,因此我在每行中寫了'foo','bar','baz'。
import csv
# Write the CSV file with the data found from extractNumbers
with open('records.csv', 'w') as records:
writer = csv.writer(records, delimiter=';')
writer.writerow(['Path', 'Text1', 'Text2', 'Text3'])
for entry in nums:
# Ensure that you're writing a relative xpath
rel_path = entry[0]
# you will want to "Text1", 'foo' below, to be an appropriate value, as it will be written into the xml below
writer.writerow([rel_path, 'foo', 'bar', 'baz'])
您現在將擁有以下CSV文件
Path;Text1;Text2;Text3
".//country[@name=""Peru""]/rank";foo;bar;baz
".//country[@name=""Peru""]/gdpnp";foo;bar;baz
".//country[@name=""Singapore""]/rank";foo;bar;baz
".//country[@name=""Singapore""]/gdpnp";foo;bar;baz
在以下代碼中,您將讀取csv文件讀取CSV文件,並使用PATH列更改適當的值
import csv
import xml.etree.ElementTree as ET
with open('records.csv', 'r') as records:
reader = csv.reader(records, delimiter=';')
for row in reader:
if reader.line_num == 1: continue # skip the row of headers
for data in tree.findall(row[0]):
data.text = row[1]
tree.write('jerry_new.xml')
您將在jerry_new.xml中獲得以下結果
<data>
<country name="Peru">
<rank updated="yes">foo</rank>
<language>english</language>
<currency>1.21$/kg</currency>
<gdppc month="06">141100</gdppc>
<gdpnp month="10">foo</gdpnp>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">foo</rank>
<language>english</language>
<currency>4.1$/kg</currency>
<gdppc month="05">59900</gdppc>
<gdpnp month="08">foo</gdpnp>
<neighbor direction="N" name="Malaysia" />
</country>
</data>
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.