[英]How to extract specific attributes value from multiple tags in xml using python
xml: xml:
<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
<Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
<File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
<Stage description="" locale="n" name="DANGER"/>
</File>
<File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
<Stage description="" locale="n" name="SPINNERS"/>
</File>
<File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
<Stage description="" locale="n" name="CAUTION"/>
</File>
</Page>
Expected Output in table format is:表格格式中的预期 Output 为:
Id,item,stage_status,Number Id,item,stage_status,Number
1,1,True,05051401, ,DANGER 1,1,真,05051401,,危险
1,1,True,05051402, ,SPINNERS 1,1,True,05051402, ,纺纱机
1,1,True,05051404, ,CAUTION 1,1,True,05051404,,注意
I tried this code:我试过这段代码:
import csv
import xml.etree.ElementTree as ET
tree = ET.parse("status-1.1.xml")
root = tree.getroot()
with open('Data.csv', 'w') as f:
w = csv.DictWriter(f, fieldnames=('Id', 'item', 'stage_status', 'Number','description','name'))
w.writerheader()
w.writerows(e.attrib for e in root.findall('.//Page/File/Stage'))
I'm trying to get values from both File and stage tags.我正在尝试从 File 和 stage 标签中获取值。
from bs4 import BeautifulSoup as Soup
import pandas as pd
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
<Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
<File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
<Stage description="" locale="n" name="DANGER"/>
</File>
<File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
<Stage description="" locale="n" name="SPINNERS"/>
</File>
<File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
<Stage description="" locale="n" name="CAUTION"/>
</File>
</Page>
'''
xml_data = Soup(xml, features="lxml")
params = ['id','item','stage_status','number']
all_data = []
for i in xml_data.findAll("file"):
tmp_dict = dict(zip(params,[i['id'],i['item'],i.find('stage')['name'],i['number']]))
all_data.append(tmp_dict)
df = pd.DataFrame(all_data)
df
Output: Output:
id item stage_status number
0 1 1 DANGER 05051401
1 2 2 SPINNERS 05051402
2 3 3 CAUTION 05051404
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.