使用python解析嵌套的XML

Question

I'm trying to parse Nested XML using python. 我正在尝试使用python解析嵌套XML。 Sample file format looks like this 示例文件格式如下所示

<repositoryFileTreeDto>
    <children>
        <children>
            <file>
                <name> File1 </name>
                <path> home/user1/File1.txt </path>
            </file>
        </children>
        <children>
            <file>
                <name> File2 </name>
                <path> home/user1/File2.txt </path>
            </file>
        </children>
        <file>
            <name> User1 </name>
            <path> home/user1 </path>
        </file>
    </children>
    <children>
        <file>
            <name> User2 </name>
            <path> home/user2 </path>
        </file>
    </children>
    <children>
        <file>
            <name> User3 </name>
            <path> home/user3 </path>
        </file>
    </children>
    <children>
        <children>
            <file>
                <name> File4 </name>
                <path> home/user4/File4.txt </path>
            </file>
        </children>
        <file>
            <name> User4 </name>
            <path> home/user4 </path>
        </file>
    </children>
    <file>
        <name> Home </name>
        <path> /home </path>
    </file>
</repositoryFileTreeDto>

I want to print the Empty uses folders and Non-Empty User folders(ie users with 1 or more files). 我要打印“空使用”文件夹和“非空用户”文件夹（即具有1个或多个文件的用户）。

Here in the XML snippet. 在XML片段中。

User 2 & User 3 are Empty Folders and User 1 is a Non-Empty user. 用户2和用户3是空文件夹，用户1是非空用户。

Condition to identify Empty and Non-Empty Users: 识别空用户和非空用户的条件：

If the User has any tag at the same level then Non-Empty User. 如果用户在同一级别具有任何标签，则为非空用户。 If the user doesn't have tag then it is Empty User. 如果用户没有标签，则为空用户。

Sample Code 1: 示例代码1：

import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
for child in root.findall('children'):
    for subchlid in child.findall('file'):
        title = subchlid.find('title').text
        print(title)
    for subchlid1 in child.findall('children'):
        if subchlid1.tag == 'children':
            print(subchlid1.tag)

Code Output 1: 代码输出1：

User1
File1
File2
User2
User3
User4
File4

Sample Code2: 示例代码2：

import xml.etree.ElementTree as ET
import time
import requests
import csv
tree = ET.parse('tree.xml')
root = tree.getroot()
list_values = []
dicts = {}
for child in root.findall('children'):
    for sub_child in child.findall('file'):
        username = sub_child.find('title').text

    for sub_child1 in child.findall('children'):
        for sub_child2 in sub_child1.findall('file'):
            file_path = sub_child2.find('path').text
            file_title = sub_child2.find('title').text
        #print(username)
        #print(file_title)
        list_values.append(file_title)
        for user in username:
            dicts[username] = list_values
print(dicts)

Code Output 2: 代码输出2：

{'User1': ['File1', 'File2'],'User4': ['File1', 'File2', 'File4']}

Here in this output User2 and User3 is not part of the Dict because it is an empty folder and User4 is sharing the User1 files. 在此输出中，User2和User3不属于Dict，因为它是一个空文件夹，并且User4正在共享User1文件。

Expected Output: 预期产量：

The number of Empty Users: 2
The number of Non-Empty Users: 2
User1 Files are: File1, File2
User4 files are: File4

Thanks all guys. 谢谢大家。

Answer 1

If you are open to using the lxml package, you can use xpath to get your result. 如果您愿意使用lxml包，则可以使用xpath获得结果。

from lxml import etree
from operator import itemgetter
from collections import defaultdict

first = itemgetter(0)
with open('tree.xml') as fp:
    xml = etree.fromstring(fp.read())

# create a user dictionary and a file dictionary
udict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
         for e in xml.xpath('children/file')}

fdict = {first(e.xpath('path/text()')).strip(): first(e.xpath('name/text()')).strip()
         for e in xml.xpath('children/children/file')}

ufdict = defaultdict(list)

for k,v in fdict.items():
    ufdict[first(k.rsplit('/',1))].append(v)

out = {v: ufdict.get(k, []) for k, v in udict.items()}

print('The number of Empty Users: {}'.format(len([v for v in out.values() if not v]))
print('The number of Non-Empty Users: {}'.format(len([v for v in out.values() if not v]))
for k, v in out.items():
    if v:
        print(f'{k} files are {", ".join(v)}')

使用python解析嵌套的XML

问题描述

1 个解决方案

解决方案1
0 2018-06-28 15:04:36

使用python解析嵌套的XML

问题描述

1 个解决方案

解决方案1 0 2018-06-28 15:04:36

解决方案1
0 2018-06-28 15:04:36