[英]Python sort elements by value xml
I have a large xml file.我有一个很大的 xml 文件。 I tried to use ElementTree XML API for python and I could parse xml file by tags and after it successfully generate csv file.
我尝试将 ElementTree XML API 用于 python,我可以通过标签解析 xml 文件,并在它成功生成 csv 文件后。 Now, I have different problem with same name tags and their information.
现在,我对相同的名称标签及其信息有不同的问题。
For example, a xml file contains the same tags called user which is used for many different users.例如,一个 xml 文件包含称为 user 的相同标签,用于许多不同的用户。
<User>
<Number>145321</Number>
<Name>Tony</Name>
<Address>
<City>Stockholm</City>
<Country>Sweden</Country>
<FullAddress>example address</FullAddress>
</Address>
<CustomerID>1234</CustomerID>
<Accounts>
<AccountID>8774</AccountID>
</Accounts>
<Payment></Payment>
</User>
After this structure goes another same structure with same name which describes different user and its elements.在这个结构之后是另一个具有相同名称的相同结构,它描述了不同的用户及其元素。 How this information can be differentiated?
如何区分这些信息? For example if I want to find user name according to the AccountID number and then save it in csv format , how can I do that?
例如,如果我想根据 AccountID 号查找用户名,然后将其保存为 csv 格式,我该怎么做?
The code below turn the 'XML User' to a 'Python User' Once you have the User class it is easy to look for data.下面的代码将“XML 用户”转换为“Python 用户” 一旦有了 User 类,查找数据就很容易了。
from dataclasses import dataclass
import xml.etree.ElementTree as ET
@dataclass
class Address:
city: str
country: str
full_address:str
@dataclass
class User:
number: int
name: str
address: Address
accounts: []
xml = '''<Users>
<User>
<Number>145321</Number>
<Name>Tony</Name>
<Address>
<City>Stockholm</City>
<Country>Sweden</Country>
<FullAddress>example address</FullAddress>
</Address>
<CustomerID>1234</CustomerID>
<Accounts>
<AccountID>8774</AccountID>
</Accounts>
<Payment></Payment>
</User>
<User>
<Number>145441</Number>
<Name>Jack</Name>
<Address>
<City>London</City>
<Country>UK</Country>
<FullAddress>example address</FullAddress>
</Address>
<CustomerID>5588</CustomerID>
<Accounts>
<AccountID>1966</AccountID>
</Accounts>
<Payment></Payment>
</User>
</Users> '''
def _get_addr(ue):
ae = ue.find('Address')
return Address(ae.find('City').text,ae.find('Country').text,ae.find('FullAddress').text)
root = ET.fromstring(xml)
user_elements = root.findall('.//User')
users = []
for ue in user_elements:
users.append(User(int(ue.find('Number').text),ue.find('Name').text,_get_addr(ue),[int(ac.text) for ac in ue.find('Accounts').findall('AccountID')]))
for user in users:
print(user)
# Look for Jack - linear search just for demo
for user in users:
if user.name == 'Jack':
print('Found')
output输出
User(number=145321, name='Tony', address=Address(city='Stockholm', country='Sweden', full_address='example address'), accounts=[8774])
User(number=145441, name='Jack', address=Address(city='London', country='UK', full_address='example address'), accounts=[1966])
Found
I found that this lib has added a large file processing function.发现这个lib增加了大文件处理功能。 I'll give you a try.
我给你试试。
from simplified_scrapy import SimplifiedDoc, utils
doc = SimplifiedDoc(edit=False)
doc.loadFile('test.xml', lineByline=True)
users = []
for user in doc.getIterable('User'):
AccountIDs = user.selects('AccountID').text
if '8774' in AccountIDs: # Look up AccountID
users.append([
user.Number.text, user.Name.text, user.CustomerID.text,
user.Payment.text, ','.join(user.Address.children.text),
','.join(AccountIDs)
])
utils.save2csv('user.csv', users)
Here are more examples. 这里有更多的例子。 This lib is easy to use.
这个库很容易使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.