如何将一列字典转换为熊猫中的单独列？

Question

Given the following dictionary created from df['statistics'].head().to_dict()鉴于从df['statistics'].head().to_dict()创建的以下字典

{0: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 1: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 2: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 3: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 4: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}}}

Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, ie statisistics.executions.total would become statistics_executions_total or even executions_total?有没有办法将字典键/值对扩展到它们自己的列中，并用原始列的名称作为这些列的前缀，即 statisistics.executions.total 将成为 statistics_executions_total 甚至 executions_total？

I have demonstrated that I can create the columns using the following:我已经证明我可以使用以下方法创建列：

pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1) However, you will notice that each of these newly created columns have a duplicate name "total". pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)但是，您会注意到这些新创建的每个列有一个重复的名称“总计”。

I;一世; however, have not been able to find a way to prefix the newly created columns with the original column name, ie executions_total.但是，一直无法找到一种方法来为新创建的列添加原始列名的前缀，即 executions_total。

Any ideas are greatly appreciated.任何想法都非常感谢。 -Thanks! -谢谢！

Answer 1

.apply(pd.Series) is slow, don't use it. .apply(pd.Series)很慢，不要使用它。
- See timing in Splitting dictionary/list inside a Pandas Column into Separate Columns请参阅将 Pandas 列内的字典/列表拆分为单独的列中的时间
Create a DataFrame with a 'statistics' column from the dict in the OP.从 OP 中的dict创建一个带有'statistics'列的 DataFrame。
- This will create a DataFrame with a column of dictionaries.这将创建一个包含一列字典的 DataFrame。
Use pandas.json_normalize on the 'statistics' column.在'statistics'列上使用pandas.json_normalize 。
- The default sep is .默认的sep是. . .
  - Nested records will generate names separated by sep .嵌套记录将生成由sep分隔的名称。

import pandas as pd

# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})

# display(df)
                                                                                                                                                                                                                                                                                                    statistics
0  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}

# normalize the statistics column
dfs = pd.json_normalize(df.statistics)

# display(dfs)
  total passed failed skipped  product_bug.total  product_bug.PB001  automation_bug.AB001  automation_bug.total  system_issue.total  system_issue.SI001  to_investigate.total  to_investigate.TI001  no_defect.ND001  no_defect.total
0     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
1     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
2     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
3     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
4     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0

如何将一列字典转换为熊猫中的单独列？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-19 17:48:38

如何将一列字典转换为熊猫中的单独列？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-19 17:48:38

解决方案1
1 已采纳 2020-09-19 17:48:38