[英]How to Export 2D Table in a csv file using PyCharm
我有一個 xml 文件:'product.xml',這是示例文件的示例:
<?xml version="1.0"?>
<Rowset>
<ROW>
<Product_ID>32</Product_ID>
<Company_ID>2</Company_ID>
<User_ID>90</User_ID>
<Product_Type>1</Product_Type>
<Application_ID>BBC#:1010</Application_ID>
</ROW>
<ROW>
<Product_ID>22</Product_ID>
<Company_ID>4</Company_ID>
<User_ID>190</User_ID>
<Product_Type>2</Product_Type>
<Application_ID>NBA#:1111</Application_ID>
</ROW>
<ROW>
<Product_ID>63</Product_ID>
<Company_ID>4</Company_ID>
<User_ID>99</User_ID>
<Product_Type>1</Product_Type>
<Application_ID>BBC#:1212</Application_ID>
</ROW>
<ROW>
<Product_ID>22</Product_ID>
<Company_ID>2</Company_ID>
<User_ID>65</User_ID>
<Product_Type>2</Product_Type>
<Application_ID>NBA#:2210</Application_ID>
</ROW>
這是我的代碼:
import xml.etree.cElementTree as ET
tree = ET.parse('product.xml')
root = tree.getroot()
for rows in root:
for attr in rows:
if (attr.tag=='User_ID'):
print('User_ID: ' + attr.text)
if (attr.tag=='Application_ID'):
print('Application_ID: ' + attr.text)
輸出為:
User_ID: 90
Application_ID: BBC#:1010
User_ID: 190
Application_ID: NBA#:1111
User_ID: 99
Application_ID: BBC#:1212
我想知道如何使用 Pandas 數據框生成二維表,使用“Application_ID”和“User_ID”作為 ROW 標題,並將它們的數據用作列,例如:
Application_ID User_ID
BBC#:1010 90
NBA#:1111 190
BBC#:1212 99
並將這些二維表結果導出到 csv 文件中進行保存,謝謝。
嘗試:
def parse_row(row):
ret = {'User_ID':np.nan, 'Application_ID':np.nan}
for attr in row:
if attr.tag in ret: ret[attr.tag] = attr.text
return ret
out = pd.DataFrame([parse_row(r) for r in root])
輸出:
User_ID Application_ID
0 90 BBC#:1010
1 190 NBA#:1111
2 99 BBC#:1212
3 65 NBA#:2210
Pandas 能夠將大多數文件類型讀入 DataFrames。
### This line would get you all of your columns
df = pd.read_xml('product.xml')
### Drop (remove) unwanted columns
df.drop(['Product_ID', 'Company_ID', 'Product_Type'], axis=1, inplace=True)
### Export to csv
df.to_csv('outputfile.csv')
像下面這樣的東西
import xml.etree.ElementTree as ET
import pandas as pd
xml = '''<?xml version="1.0"?>
<Rowset>
<ROW>
<Product_ID>32</Product_ID>
<Company_ID>2</Company_ID>
<User_ID>90</User_ID>
<Product_Type>1</Product_Type>
<Application_ID>BBC#:1010</Application_ID>
</ROW>
<ROW>
<Product_ID>22</Product_ID>
<Company_ID>4</Company_ID>
<User_ID>190</User_ID>
<Product_Type>2</Product_Type>
<Application_ID>NBA#:1111</Application_ID>
</ROW>
<ROW>
<Product_ID>63</Product_ID>
<Company_ID>4</Company_ID>
<User_ID>99</User_ID>
<Product_Type>1</Product_Type>
<Application_ID>BBC#:1212</Application_ID>
</ROW>
<ROW>
<Product_ID>22</Product_ID>
<Company_ID>2</Company_ID>
<User_ID>65</User_ID>
<Product_Type>2</Product_Type>
<Application_ID>NBA#:2210</Application_ID>
</ROW>
</Rowset>
'''
FIELDS = ['Application_ID','User_ID']
data = []
root = ET.fromstring(xml)
for row in root.findall('.//ROW'):
data.append([row.find(f).text for f in FIELDS])
df = pd.DataFrame(data,columns=FIELDS)
print(df)
輸出
Application_ID User_ID
0 BBC#:1010 90
1 NBA#:1111 190
2 BBC#:1212 99
3 NBA#:2210 65
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.