繁体   English   中英

如何从嵌套字典中提取值作为 Pandas DataFrames

[英]How to extract values from a nested Dictionary as Pandas DataFrames

你好stackoverflowers,

我希望您能帮助我理解嵌套字典遇到的问题。 我从 excel 文件中抓取了一些表格: ['Table 5','Table 8',Table 40'] 我从我使用的代码中得到的是一个嵌套字典,我不确定如何处理。 我想这些才是初学者真正的痛苦。 我的目标是使用键将值转换为数据框(例如表 5)。 原表: 在此处输入图片说明

数据框示例:

d = {0: ['TB','VT','BT','CI','CH','CL','RT','RU','PV','PV','PV','PV','PV','RH','PV','PV','PV','PV','NaN','NaN','TB','VT','BT','CI','CH','CL','RT','RU','PV','PV'], 
     1: ['Table 1','BRAND. SUMMARY','Base: Floating Base (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','brand1','brand2','brand3','brand4','NPS','','NaN','Row1','Row2','Row3','NaN','NaN','Table 5','Brands Title 1','Base: All (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','Brand1','Brand2'],
     2: ['NaN','NaN','NaN','(TOTAL)','Discrete monthly banner','Sept (a)','100','997','0.31','0.31','0.31','0.31','0.31','NaN','0.62','0.64','0.61','0.6','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Total','19479','19608','0.75','0.75'],
     3: ['NaN','NaN','NaN','NaN','NaN','Oct (b)','1090','1100','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','TOTAL','Discrete monthly banner','Sept (a)','1000','1000','0.8','0.8'],
     4: ['NaN','NaN','NaN','NaN','NaN','Nov (c)','3164','3191','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Oct (b)','1000','1000','0.8','0.8'],
     5: ['NaN','NaN','NaN','NaN','NaN','Dec (d)','992','3999','0.31','0.31','0.31','0.31','0.31','NaN','0.51','0.47','0.67','0.61','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Nov (c)','1000','1000','0.8','0.8']}

当我打印表值和键时,会返回:

在此处输入图片说明

第 174 行应该是我的列标题。

这是我用来从 Excel 中抓取表格的代码:

ws = pd.read_excel(r'C:\Users\Tables.xlsx', sheet_name= "Percents", header = None, usecols="B:XFD")

table_names = ["Table 5", "Table 8", "Table 9", "Table 40"]
groups = ws[1].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:24] for k,g in ws.groupby(groups)}
#because the syntax above (e.g.tables={g.iloc}) returned also the other values, I filtered again based on the table names
filtered_d = dict((k, tables[k]) for k in table_names if k in tables)

我尝试修改此代码以返回我的值,但是当我删除orient="index"或说orient="columns"我收到错误消息。 我认为 for 循环可以解决问题。

df = pd.DataFrame.from_dict({(i,j): filtered_d[i][j] 
                           for i in filtered_d.keys() 
                           for j in filtered_d[i].keys()}, orient="index")

如何通过保持当前表格格式并将每个值转换为数据框来解决这个问题?

预先感谢您给我的任何建议。

我不完全确定您想要什么输出,但是通过提供的示例,我们可以试一试。 这是你追求的吗?

import pandas as pd
df = pd.DataFrame({0: ['TB','VT','BT','CI','CH','CL','RT','RU','PV','PV','PV','PV','PV','RH','PV','PV','PV','PV','NaN','NaN','TB','VT','BT','CI','CH','CL','RT','RU','PV','PV'], 
     1: ['Table 1','BRAND. SUMMARY','Base: Floating Base (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','brand1','brand2','brand3','brand4','NPS','','NaN','Row1','Row2','Row3','NaN','NaN','Table 5','Brands Title 1','Base: All (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','Brand1','Brand2'],
     2: ['NaN','NaN','NaN','(TOTAL)','Discrete monthly banner','Sept (a)','100','997','0.31','0.31','0.31','0.31','0.31','NaN','0.62','0.64','0.61','0.6','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Total','19479','19608','0.75','0.75'],
     3: ['NaN','NaN','NaN','NaN','NaN','Oct (b)','1090','1100','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','TOTAL','Discrete monthly banner','Sept (a)','1000','1000','0.8','0.8'],
     4: ['NaN','NaN','NaN','NaN','NaN','Nov (c)','3164','3191','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Oct (b)','1000','1000','0.8','0.8'],
     5: ['NaN','NaN','NaN','NaN','NaN','Dec (d)','992','3999','0.31','0.31','0.31','0.31','0.31','NaN','0.51','0.47','0.67','0.61','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Nov (c)','1000','1000','0.8','0.8']})
tbl = df.drop(range(5), axis=0).drop(0, axis=1)
print(tbl)

或者,您可能想适当地命名行和列:

index = tbl.iloc[:,0]
columns = tbl.iloc[0]
data = df.drop(range(6), axis=0).drop(range(2), axis=1)
tbl2 = pd.DataFrame(data, index=index, columns=columns)

无论如何,希望您可以将其强制为正确的格式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM