Python按值xml对元素进行排序

Question

I have a large xml file.我有一个很大的 xml 文件。 I tried to use ElementTree XML API for python and I could parse xml file by tags and after it successfully generate csv file.我尝试将 ElementTree XML API 用于 python，我可以通过标签解析 xml 文件，并在它成功生成 csv 文件后。 Now, I have different problem with same name tags and their information.现在，我对相同的名称标签及其信息有不同的问题。

For example, a xml file contains the same tags called user which is used for many different users.例如，一个 xml 文件包含称为 user 的相同标签，用于许多不同的用户。

<User>
    <Number>145321</Number>
    <Name>Tony</Name>
    <Address>
    <City>Stockholm</City>
    <Country>Sweden</Country>
    <FullAddress>example address</FullAddress>
    </Address>
    <CustomerID>1234</CustomerID>
    <Accounts>
    <AccountID>8774</AccountID>
    </Accounts>
    <Payment></Payment>
</User>

After this structure goes another same structure with same name which describes different user and its elements.在这个结构之后是另一个具有相同名称的相同结构，它描述了不同的用户及其元素。 How this information can be differentiated?如何区分这些信息？ For example if I want to find user name according to the AccountID number and then save it in csv format , how can I do that?例如，如果我想根据 AccountID 号查找用户名，然后将其保存为 csv 格式，我该怎么做？

Answer 1

The code below turn the 'XML User' to a 'Python User' Once you have the User class it is easy to look for data.下面的代码将“XML 用户”转换为“Python 用户” 一旦有了 User 类，查找数据就很容易了。

from dataclasses import dataclass
import xml.etree.ElementTree as ET


@dataclass
class Address:
  city: str
  country: str
  full_address:str 

@dataclass
class User:
    number: int
    name: str
    address: Address
    accounts: []


xml = '''<Users>
    <User>
        <Number>145321</Number>
        <Name>Tony</Name>
        <Address>
            <City>Stockholm</City>
            <Country>Sweden</Country>
            <FullAddress>example address</FullAddress>
        </Address>
        <CustomerID>1234</CustomerID>
        <Accounts>
            <AccountID>8774</AccountID>
        </Accounts>
        <Payment></Payment>
    </User>
        <User>
        <Number>145441</Number>
        <Name>Jack</Name>
        <Address>
            <City>London</City>
            <Country>UK</Country>
            <FullAddress>example address</FullAddress>
        </Address>
        <CustomerID>5588</CustomerID>
        <Accounts>
            <AccountID>1966</AccountID>
        </Accounts>
        <Payment></Payment>
    </User>
</Users> '''


def _get_addr(ue):
  ae = ue.find('Address')
  return Address(ae.find('City').text,ae.find('Country').text,ae.find('FullAddress').text)


root = ET.fromstring(xml)
user_elements = root.findall('.//User')
users = []
for ue in user_elements:
  users.append(User(int(ue.find('Number').text),ue.find('Name').text,_get_addr(ue),[int(ac.text) for ac in ue.find('Accounts').findall('AccountID')]))

for user in users:
  print(user)
# Look for Jack - linear search just for demo
for user in users:
  if user.name == 'Jack':
    print('Found')

output输出

User(number=145321, name='Tony', address=Address(city='Stockholm', country='Sweden', full_address='example address'), accounts=[8774])
User(number=145441, name='Jack', address=Address(city='London', country='UK', full_address='example address'), accounts=[1966])
Found

Answer 2

I found that this lib has added a large file processing function.发现这个lib增加了大文件处理功能。 I'll give you a try.我给你试试。

from simplified_scrapy import SimplifiedDoc, utils

doc = SimplifiedDoc(edit=False)
doc.loadFile('test.xml', lineByline=True)

users = []
for user in doc.getIterable('User'):
    AccountIDs = user.selects('AccountID').text
    if '8774' in AccountIDs: # Look up AccountID
        users.append([
            user.Number.text, user.Name.text, user.CustomerID.text,
            user.Payment.text, ','.join(user.Address.children.text),
            ','.join(AccountIDs)
        ])

utils.save2csv('user.csv', users)

Here are more examples. 这里有更多的例子。 This lib is easy to use.这个库很容易使用。

Python按值xml对元素进行排序

问题描述

2 个解决方案

解决方案1
1 2020-10-21 15:37:30

解决方案2
1 2020-10-22 12:14:12

Python按值xml对元素进行排序

问题描述

2 个解决方案

解决方案1 1 2020-10-21 15:37:30

解决方案2 1 2020-10-22 12:14:12

解决方案1
1 2020-10-21 15:37:30

解决方案2
1 2020-10-22 12:14:12