[英]How to convert a column of dictionaries to separate columns in pandas?
Given the following dictionary created from df['statistics'].head().to_dict()
鉴于从
df['statistics'].head().to_dict()
创建的以下字典
{0: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
1: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
2: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
3: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
4: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}}}
Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, ie statisistics.executions.total would become statistics_executions_total or even executions_total?有没有办法将字典键/值对扩展到它们自己的列中,并用原始列的名称作为这些列的前缀,即 statisistics.executions.total 将成为 statistics_executions_total 甚至 executions_total?
I have demonstrated that I can create the columns using the following:我已经证明我可以使用以下方法创建列:
pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)
However, you will notice that each of these newly created columns have a duplicate name "total". pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)
但是,您会注意到这些新创建的每个列有一个重复的名称“总计”。
I;一世; however, have not been able to find a way to prefix the newly created columns with the original column name, ie executions_total.
但是,一直无法找到一种方法来为新创建的列添加原始列名的前缀,即 executions_total。
For additional insight, statistics will expand into executions and defects and executions will expand into pass |要获得更多洞察,统计数据将扩展为执行次数,缺陷和执行次数将扩展为通过 | fail |
失败 | skipped |
跳过 | total and defects will expand into automation_bug |
总和缺陷将扩展为automation_bug | system_issue |
system_issue | to_investigate |
to_investigate | product_bug |
product_bug | no_defect.
无缺陷。 The later will then expand into total |
后者随后将扩展为总计 | **001 columns where total is duplicated several times.
**001 列,其中总计重复了多次。
Any ideas are greatly appreciated.任何想法都非常感谢。 -Thanks!
-谢谢!
.apply(pd.Series)
is slow, don't use it. .apply(pd.Series)
很慢,不要使用它。
'statistics'
column from the dict
in the OP.dict
创建一个带有'statistics'
列的 DataFrame。
pandas.json_normalize
on the 'statistics'
column.'statistics'
列上使用pandas.json_normalize
。
sep
is .
sep
是.
. sep
.sep
分隔的名称。import pandas as pd
# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})
# display(df)
statistics
0 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
# normalize the statistics column
dfs = pd.json_normalize(df.statistics)
# display(dfs)
total passed failed skipped product_bug.total product_bug.PB001 automation_bug.AB001 automation_bug.total system_issue.total system_issue.SI001 to_investigate.total to_investigate.TI001 no_defect.ND001 no_defect.total
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 0 0 0 0 0 0 0 0 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.