[英]Parse Nested XML using python
我正在嘗試使用python解析嵌套XML。 示例文件格式如下所示
<repositoryFileTreeDto>
<children>
<children>
<file>
<name> File1 </name>
<path> home/user1/File1.txt </path>
</file>
</children>
<children>
<file>
<name> File2 </name>
<path> home/user1/File2.txt </path>
</file>
</children>
<file>
<name> User1 </name>
<path> home/user1 </path>
</file>
</children>
<children>
<file>
<name> User2 </name>
<path> home/user2 </path>
</file>
</children>
<children>
<file>
<name> User3 </name>
<path> home/user3 </path>
</file>
</children>
<children>
<children>
<file>
<name> File4 </name>
<path> home/user4/File4.txt </path>
</file>
</children>
<file>
<name> User4 </name>
<path> home/user4 </path>
</file>
</children>
<file>
<name> Home </name>
<path> /home </path>
</file>
</repositoryFileTreeDto>
我要打印“空使用”文件夾和“非空用戶”文件夾(即具有1個或多個文件的用戶)。
在XML片段中。
用戶2和用戶3是空文件夾,用戶1是非空用戶。
識別空用戶和非空用戶的條件:
如果用戶在同一級別具有任何標簽,則為非空用戶。 如果用戶沒有標簽,則為空用戶。
示例代碼1:
import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
for child in root.findall('children'):
for subchlid in child.findall('file'):
title = subchlid.find('title').text
print(title)
for subchlid1 in child.findall('children'):
if subchlid1.tag == 'children':
print(subchlid1.tag)
代碼輸出1:
User1
File1
File2
User2
User3
User4
File4
示例代碼2:
import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
list_values = []
dicts = {}
for child in root.findall('children'):
for sub_child in child.findall('file'):
username = sub_child.find('title').text
for sub_child1 in child.findall('children'):
for sub_child2 in sub_child1.findall('file'):
file_path = sub_child2.find('path').text
file_title = sub_child2.find('title').text
#print(username)
#print(file_title)
list_values.append(file_title)
for user in username:
dicts[username] = list_values
print(dicts)
代碼輸出2:
{'User1': ['File1', 'File2'],'User4': ['File1', 'File2', 'File4']}
在此輸出中,User2和User3不屬於Dict,因為它是一個空文件夾,並且User4正在共享User1文件。
預期產量:
The number of Empty Users: 2
The number of Non-Empty Users: 2
User1 Files are: File1, File2
User4 files are: File4
謝謝大家。
如果您願意使用lxml
包,則可以使用xpath獲得結果。
from lxml import etree
from operator import itemgetter
from collections import defaultdict
first = itemgetter(0)
with open('tree.xml') as fp:
xml = etree.fromstring(fp.read())
# create a user dictionary and a file dictionary
udict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
for e in xml.xpath('children/file')}
fdict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
for e in xml.xpath('children/children/file')}
ufdict = defaultdict(list)
for k,v in fdict.items():
ufdict[first(k.rsplit('/',1))].append(v)
out = {v: ufdict.get(k, []) for k, v in udict.items()}
print('The number of Empty Users: {}'.format(len([v for v in out.values() if not v]))
print('The number of Non-Empty Users: {}'.format(len([v for v in out.values() if not v]))
for k, v in out.items():
if v:
print(f'{k} files are {", ".join(v)}')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.