将包含字典的 Pandas 数据帧行拆分为多行

Question

I have a pandas dataframe with two columns- group_id and personal_score.我有一个包含两列的 Pandas 数据框 - group_id 和personal_score。 The group id column contains some id numbers, where the personal score column contains dictionaries of different length. group id 列包含一些 id 编号，其中个人分数列包含不同长度的字典。 Here is an example:下面是一个例子：

  group_id personal_score
0  77     {'149': 17, '819': 22}
1  37     {'821': 18, '359': 24, '089': 17, '170': 15}
2  51     {'280': 18, '261': 18, '628': 20, '722': 21, '744': 19, '152': 19}
3  84     {'140': 19}

I want to split the dictionaries in the personal_score column into two columns, personal_id that takes the key of the dictionary and score that takes the value while the value in the group_id column is repeated for all splitted rows from the correspondent dictionary.我想将personal_score 列中的字典分成两列，personal_id 取字典的键，score 取值，而group_id 列中的值对通讯字典中的所有拆分行重复。 The output should look like:输出应如下所示：

  group_id     personal_id     score
0   77              149            17
1   77              819            22
2   37              821            18
3   37              359            24
4   37              089            17
5   37              170            15
6   51              280            18
7   51              261            18
8   51              628            20
9   51              722            21
10  51              744            19
11  51              152            19
12  84              140            19

Your help is much appreciated.非常感谢您的帮助。

Answer 1

You could do:你可以这样做：

df = pd.DataFrame([[i, k, v] for i, d in df[['group_id', 'personal_score']].values for k, v in d.items()],
                  columns=['group_id', 'personal_id', 'score'])
print(df)

Output输出

    group_id personal_id  score
0         77         149     17
1         77         819     22
2         37         821     18
3         37         359     24
4         37         089     17
5         37         170     15
6         51         280     18
7         51         261     18
8         51         628     20
9         51         722     21
10        51         744     19
11        51         152     19
12        84         140     19

Answer 2

You can do by:你可以这样做：

data={  "group_id":[77,37,51,84],
        "personal_score":[{'149': 17, '819': 22},{'821': 18, '359': 24, '089': 17, '170': 15},
                        {'280': 18, '261': 18, '628': 20, '722': 21, '744': 19, '152': 19},
                        {'140': 19}]}

df=pd.DataFrame(data)
perso_score = pd.DataFrame(pd.DataFrame(df['personal_score'].values.tolist()).stack().reset_index(level=1))
perso_score.columns = ['personal_ID','score']
df.drop(columns='personal_score',inplace=True)
df = df.join(perso_score )

print(df)

result:结果：

   group_id personal_ID  score
0        77         149   17.0
0        77         819   22.0
1        37         821   18.0
1        37         359   24.0
1        37         089   17.0
1        37         170   15.0
2        51         280   18.0
2        51         261   18.0
2        51         628   20.0
2        51         722   21.0
2        51         744   19.0
2        51         152   19.0
3        84         140   19.0

Answer 3

Let's be lazy and try:让我们偷懒并尝试：

# this creates a lot of unnecessary NaN in the data
# and we use `stack` to kill them
# that's why we say `lazy`
(pd.DataFrame(df.set_index('group_id')['personal_score'].to_dict())
   .stack()
   .rename_axis(index=('personal_id','group_id'))
   .reset_index(name='score')
)

Another way is to do a dict comprehension and concat:另一种方法是做一个 dict 理解和连接：

(pd.concat({x:pd.DataFrame({'score':y}) 
           for x,y in zip(df['group_id'], df['personal_score'])
          })
   .rename_axis(['personal_id','group_id'])
   .reset_index()
)

Output:输出：

   personal_id  group_id  score
0          149        77   17.0
1          819        77   22.0
2          821        37   18.0
3          359        37   24.0
4          089        37   17.0
5          170        37   15.0
6          280        51   18.0
7          261        51   18.0
8          628        51   20.0
9          722        51   21.0
10         744        51   19.0
11         152        51   19.0
12         140        84   19.0

将包含字典的 Pandas 数据帧行拆分为多行

问题描述

3 个解决方案

解决方案1
2 2020-11-24 14:54:48

解决方案2
1 2020-11-24 14:54:48

解决方案3
0 2020-11-24 14:48:29

将包含字典的 Pandas 数据帧行拆分为多行

问题描述

3 个解决方案

解决方案1 2 2020-11-24 14:54:48

解决方案2 1 2020-11-24 14:54:48

解决方案3 0 2020-11-24 14:48:29

解决方案1
2 2020-11-24 14:54:48

解决方案2
1 2020-11-24 14:54:48

解决方案3
0 2020-11-24 14:48:29