简体   繁体   English

使用python解析嵌套的XML

[英]Parse Nested XML using python

I'm trying to parse Nested XML using python. 我正在尝试使用python解析嵌套XML。 Sample file format looks like this 示例文件格式如下所示

<repositoryFileTreeDto>
    <children>
        <children>
            <file>
                <name> File1 </name>
                <path> home/user1/File1.txt </path>
            </file>
        </children>
        <children>
            <file>
                <name> File2 </name>
                <path> home/user1/File2.txt </path>
            </file>
        </children>
        <file>
            <name> User1 </name>
            <path> home/user1 </path>
        </file>
    </children>
    <children>
        <file>
            <name> User2 </name>
            <path> home/user2 </path>
        </file>
    </children>
    <children>
        <file>
            <name> User3 </name>
            <path> home/user3 </path>
        </file>
    </children>
    <children>
        <children>
            <file>
                <name> File4 </name>
                <path> home/user4/File4.txt </path>
            </file>
        </children>
        <file>
            <name> User4 </name>
            <path> home/user4 </path>
        </file>
    </children>
    <file>
        <name> Home </name>
        <path> /home </path>
    </file>
</repositoryFileTreeDto>

I want to print the Empty uses folders and Non-Empty User folders(ie users with 1 or more files). 我要打印“空使用”文件夹和“非空用户”文件夹(即具有1个或多个文件的用户)。

Here in the XML snippet. 在XML片段中。

User 2 & User 3 are Empty Folders and User 1 is a Non-Empty user. 用户2和用户3是空文件夹,用户1是非空用户。

Condition to identify Empty and Non-Empty Users: 识别空用户和非空用户的条件:

If the User has any tag at the same level then Non-Empty User. 如果用户在同一级别具有任何标签,则为非空用户。 If the user doesn't have tag then it is Empty User. 如果用户没有标签,则为空用户。

Sample Code 1: 示例代码1:

import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
for child in root.findall('children'):
    for subchlid in child.findall('file'):
        title = subchlid.find('title').text
        print(title)
    for subchlid1 in child.findall('children'):
        if subchlid1.tag == 'children':
            print(subchlid1.tag)

Code Output 1: 代码输出1:

User1
File1
File2
User2
User3
User4
File4

Sample Code2: 示例代码2:

import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
list_values = []
dicts = {}
for child in root.findall('children'):
    for sub_child in child.findall('file'):
        username = sub_child.find('title').text

    for sub_child1 in child.findall('children'):
        for sub_child2 in sub_child1.findall('file'):
            file_path = sub_child2.find('path').text
            file_title = sub_child2.find('title').text
        #print(username)
        #print(file_title)
        list_values.append(file_title)
        for user in username:
            dicts[username] = list_values
print(dicts)

Code Output 2: 代码输出2:

{'User1': ['File1', 'File2'],'User4': ['File1', 'File2', 'File4']}

Here in this output User2 and User3 is not part of the Dict because it is an empty folder and User4 is sharing the User1 files. 在此输出中,User2和User3不属于Dict,因为它是一个空文件夹,并且User4正在共享User1文件。

Expected Output: 预期产量:

The number of Empty Users: 2
The number of Non-Empty Users: 2
User1 Files are: File1, File2
User4 files are: File4

Thanks all guys. 谢谢大家。

If you are open to using the lxml package, you can use xpath to get your result. 如果您愿意使用lxml包,则可以使用xpath获得结果。

from lxml import etree
from operator import itemgetter
from collections import defaultdict

first = itemgetter(0)
with open('tree.xml') as fp:
    xml = etree.fromstring(fp.read())

# create a user dictionary and a file dictionary
udict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
         for e in xml.xpath('children/file')}

fdict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
         for e in xml.xpath('children/children/file')}

ufdict = defaultdict(list)

for k,v in fdict.items():
    ufdict[first(k.rsplit('/',1))].append(v)

out = {v: ufdict.get(k, []) for k, v in udict.items()}

print('The number of Empty Users: {}'.format(len([v for v in out.values() if not v]))
print('The number of Non-Empty Users: {}'.format(len([v for v in out.values() if not v]))
for k, v in out.items():
    if v:
        print(f'{k} files are {", ".join(v)}')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM