简体   繁体   English

如何将一列字典转换为熊猫中的单独列?

[英]How to convert a column of dictionaries to separate columns in pandas?

Given the following dictionary created from df['statistics'].head().to_dict()鉴于从df['statistics'].head().to_dict()创建的以下字典

{0: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 1: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 2: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 3: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 4: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}}}

Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, ie statisistics.executions.total would become statistics_executions_total or even executions_total?有没有办法将字典键/值对扩展到它们自己的列中,并用原始列的名称作为这些列的前缀,即 statisistics.executions.total 将成为 statistics_executions_total 甚至 executions_total?

I have demonstrated that I can create the columns using the following:我已经证明我可以使用以下方法创建列:

pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1) However, you will notice that each of these newly created columns have a duplicate name "total". pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)但是,您会注意到这些新创建的每个列有一个重复的名称“总计”。

I;一世; however, have not been able to find a way to prefix the newly created columns with the original column name, ie executions_total.但是,一直无法找到一种方法来为新创建的列添加原始列名的前缀,即 executions_total。

For additional insight, statistics will expand into executions and defects and executions will expand into pass |要获得更多洞察,统计数据将扩展为执行次数,缺陷和执行次数将扩展为通过 | fail |失败 | skipped |跳过 | total and defects will expand into automation_bug |总和缺陷将扩展为automation_bug | system_issue | system_issue | to_investigate | to_investigate | product_bug | product_bug | no_defect.无缺陷。 The later will then expand into total |后者随后将扩展为总计 | **001 columns where total is duplicated several times. **001 列,其中总计重复了多次。

Any ideas are greatly appreciated.任何想法都非常感谢。 -Thanks! -谢谢!

import pandas as pd

# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})

# display(df)
                                                                                                                                                                                                                                                                                                    statistics
0  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}

# normalize the statistics column
dfs = pd.json_normalize(df.statistics)

# display(dfs)
  total passed failed skipped  product_bug.total  product_bug.PB001  automation_bug.AB001  automation_bug.total  system_issue.total  system_issue.SI001  to_investigate.total  to_investigate.TI001  no_defect.ND001  no_defect.total
0     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
1     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
2     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
3     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
4     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM