简体   繁体   English

如何使用 python 从 xml 中的多个标签中提取特定属性值

[英]How to extract specific attributes value from multiple tags in xml using python

xml: xml:

<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
  <Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
  <File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
    <Stage description="" locale="n" name="DANGER"/>
  </File>
  <File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
    <Stage description="" locale="n" name="SPINNERS"/>
  </File>
  <File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
    <Stage description="" locale="n" name="CAUTION"/>
  </File>
</Page>

Expected Output in table format is:表格格式中的预期 Output 为:

Id,item,stage_status,Number Id,item,stage_status,Number

1,1,True,05051401, ,DANGER 1,1,真,05051401,,危险

1,1,True,05051402, ,SPINNERS 1,1,True,05051402, ,纺纱机

1,1,True,05051404, ,CAUTION 1,1,True,05051404,,注意

I tried this code:我试过这段代码:

import csv
import xml.etree.ElementTree as ET

tree = ET.parse("status-1.1.xml")
root = tree.getroot()

with open('Data.csv', 'w') as f:
    w = csv.DictWriter(f, fieldnames=('Id', 'item', 'stage_status', 'Number','description','name'))
    w.writerheader()
    w.writerows(e.attrib for e in root.findall('.//Page/File/Stage'))

I'm trying to get values from both File and stage tags.我正在尝试从 File 和 stage 标签中获取值。

from bs4 import BeautifulSoup as Soup
import pandas as pd

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<Page xmlns="http://gigabyte.com/documoto/Statuslist/1.6" xmlns:xs="http://www.w3.org/2001/XMLSchema" hashKey="MDAwNTgxMzQtQS0xLjEuc3Zn" pageFile="status-1.1.svg" tenantKey="Staus">
  <Stage description="SPREADER,GB/DD" locale="en" name="SPREADER,GB/DD"/>
  <File Price="0.0" Id="1" item="1" stage_status="true" ForPage="true" Number="05051401">
    <Stage description="" locale="n" name="DANGER"/>
  </File>
  <File Price="0.0" Id="2" item="2" stage_status="true" ForPage="true" Number="05051402">
    <Stage description="" locale="n" name="SPINNERS"/>
  </File>
  <File Price="0.0" Id="3" item="3" stage_status="true" ForPage="true" Number="05051404">
    <Stage description="" locale="n" name="CAUTION"/>
  </File>
</Page>
'''
xml_data = Soup(xml, features="lxml")


params = ['id','item','stage_status','number']
all_data = []
for i in xml_data.findAll("file"):
    tmp_dict = dict(zip(params,[i['id'],i['item'],i.find('stage')['name'],i['number']]))
    all_data.append(tmp_dict)
df = pd.DataFrame(all_data)
df

Output: Output:

    id  item    stage_status    number
0   1   1       DANGER          05051401
1   2   2       SPINNERS        05051402
2   3   3       CAUTION         05051404

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM