如何从嵌套字典中提取值作为 Pandas DataFrames

Question

你好stackoverflowers，

我希望您能帮助我理解嵌套字典遇到的问题。 我从 excel 文件中抓取了一些表格： ['Table 5','Table 8',Table 40'] 。 我从我使用的代码中得到的是一个嵌套字典，我不确定如何处理。 我想这些才是初学者真正的痛苦。 我的目标是使用键将值转换为数据框（例如表 5）。 原表：

数据框示例：

d = {0: ['TB','VT','BT','CI','CH','CL','RT','RU','PV','PV','PV','PV','PV','RH','PV','PV','PV','PV','NaN','NaN','TB','VT','BT','CI','CH','CL','RT','RU','PV','PV'], 
     1: ['Table 1','BRAND. SUMMARY','Base: Floating Base (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','brand1','brand2','brand3','brand4','NPS','','NaN','Row1','Row2','Row3','NaN','NaN','Table 5','Brands Title 1','Base: All (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','Brand1','Brand2'],
     2: ['NaN','NaN','NaN','(TOTAL)','Discrete monthly banner','Sept (a)','100','997','0.31','0.31','0.31','0.31','0.31','NaN','0.62','0.64','0.61','0.6','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Total','19479','19608','0.75','0.75'],
     3: ['NaN','NaN','NaN','NaN','NaN','Oct (b)','1090','1100','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','TOTAL','Discrete monthly banner','Sept (a)','1000','1000','0.8','0.8'],
     4: ['NaN','NaN','NaN','NaN','NaN','Nov (c)','3164','3191','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Oct (b)','1000','1000','0.8','0.8'],
     5: ['NaN','NaN','NaN','NaN','NaN','Dec (d)','992','3999','0.31','0.31','0.31','0.31','0.31','NaN','0.51','0.47','0.67','0.61','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Nov (c)','1000','1000','0.8','0.8']}

当我打印表值和键时，会返回：

第 174 行应该是我的列标题。

这是我用来从 Excel 中抓取表格的代码：

ws = pd.read_excel(r'C:\Users\Tables.xlsx', sheet_name= "Percents", header = None, usecols="B:XFD")

table_names = ["Table 5", "Table 8", "Table 9", "Table 40"]
groups = ws[1].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:24] for k,g in ws.groupby(groups)}
#because the syntax above (e.g.tables={g.iloc}) returned also the other values, I filtered again based on the table names
filtered_d = dict((k, tables[k]) for k in table_names if k in tables)

我尝试修改此代码以返回我的值，但是当我删除orient="index"或说orient="columns"我收到错误消息。 我认为 for 循环可以解决问题。

df = pd.DataFrame.from_dict({(i,j): filtered_d[i][j] 
                           for i in filtered_d.keys() 
                           for j in filtered_d[i].keys()}, orient="index")

如何通过保持当前表格格式并将每个值转换为数据框来解决这个问题？

预先感谢您给我的任何建议。

Answer 1

我不完全确定您想要什么输出，但是通过提供的示例，我们可以试一试。 这是你追求的吗？

import pandas as pd
df = pd.DataFrame({0: ['TB','VT','BT','CI','CH','CL','RT','RU','PV','PV','PV','PV','PV','RH','PV','PV','PV','PV','NaN','NaN','TB','VT','BT','CI','CH','CL','RT','RU','PV','PV'], 
     1: ['Table 1','BRAND. SUMMARY','Base: Floating Base (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','brand1','brand2','brand3','brand4','NPS','','NaN','Row1','Row2','Row3','NaN','NaN','Table 5','Brands Title 1','Base: All (TOTAL) (18-59)','NaN','NaN','NaN','Base','Unweighted row','Brand1','Brand2'],
     2: ['NaN','NaN','NaN','(TOTAL)','Discrete monthly banner','Sept (a)','100','997','0.31','0.31','0.31','0.31','0.31','NaN','0.62','0.64','0.61','0.6','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Total','19479','19608','0.75','0.75'],
     3: ['NaN','NaN','NaN','NaN','NaN','Oct (b)','1090','1100','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','TOTAL','Discrete monthly banner','Sept (a)','1000','1000','0.8','0.8'],
     4: ['NaN','NaN','NaN','NaN','NaN','Nov (c)','3164','3191','0.31','0.31','0.31','0.31','0.31','NaN','0.64','0.67','0.64','0.64','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Oct (b)','1000','1000','0.8','0.8'],
     5: ['NaN','NaN','NaN','NaN','NaN','Dec (d)','992','3999','0.31','0.31','0.31','0.31','0.31','NaN','0.51','0.47','0.67','0.61','NaN','NaN','NaN','NaN','NaN','NaN','NaN','Nov (c)','1000','1000','0.8','0.8']})
tbl = df.drop(range(5), axis=0).drop(0, axis=1)
print(tbl)

或者，您可能想适当地命名行和列：

index = tbl.iloc[:,0]
columns = tbl.iloc[0]
data = df.drop(range(6), axis=0).drop(range(2), axis=1)
tbl2 = pd.DataFrame(data, index=index, columns=columns)

无论如何，希望您可以将其强制为正确的格式。

如何从嵌套字典中提取值作为 Pandas DataFrames

问题描述

1 个解决方案

解决方案1
1 2021-09-21 15:43:12

如何从嵌套字典中提取值作为 Pandas DataFrames

问题描述

1 个解决方案

解决方案1 1 2021-09-21 15:43:12

解决方案1
1 2021-09-21 15:43:12