[英]Split a pandas dataframe row containing a dictionary into multiple rows
I have a pandas dataframe with two columns- group_id and personal_score.我有一个包含两列的 Pandas 数据框 - group_id 和personal_score。 The group id column contains some id numbers, where the personal score column contains dictionaries of different length.
group id 列包含一些 id 编号,其中个人分数列包含不同长度的字典。 Here is an example:
下面是一个例子:
group_id personal_score
0 77 {'149': 17, '819': 22}
1 37 {'821': 18, '359': 24, '089': 17, '170': 15}
2 51 {'280': 18, '261': 18, '628': 20, '722': 21, '744': 19, '152': 19}
3 84 {'140': 19}
I want to split the dictionaries in the personal_score column into two columns, personal_id that takes the key of the dictionary and score that takes the value while the value in the group_id column is repeated for all splitted rows from the correspondent dictionary.我想将personal_score 列中的字典分成两列,personal_id 取字典的键,score 取值,而group_id 列中的值对通讯字典中的所有拆分行重复。 The output should look like:
输出应如下所示:
group_id personal_id score
0 77 149 17
1 77 819 22
2 37 821 18
3 37 359 24
4 37 089 17
5 37 170 15
6 51 280 18
7 51 261 18
8 51 628 20
9 51 722 21
10 51 744 19
11 51 152 19
12 84 140 19
Your help is much appreciated.非常感谢您的帮助。
You could do:你可以这样做:
df = pd.DataFrame([[i, k, v] for i, d in df[['group_id', 'personal_score']].values for k, v in d.items()],
columns=['group_id', 'personal_id', 'score'])
print(df)
Output输出
group_id personal_id score
0 77 149 17
1 77 819 22
2 37 821 18
3 37 359 24
4 37 089 17
5 37 170 15
6 51 280 18
7 51 261 18
8 51 628 20
9 51 722 21
10 51 744 19
11 51 152 19
12 84 140 19
You can do by:你可以这样做:
data={ "group_id":[77,37,51,84],
"personal_score":[{'149': 17, '819': 22},{'821': 18, '359': 24, '089': 17, '170': 15},
{'280': 18, '261': 18, '628': 20, '722': 21, '744': 19, '152': 19},
{'140': 19}]}
df=pd.DataFrame(data)
perso_score = pd.DataFrame(pd.DataFrame(df['personal_score'].values.tolist()).stack().reset_index(level=1))
perso_score.columns = ['personal_ID','score']
df.drop(columns='personal_score',inplace=True)
df = df.join(perso_score )
print(df)
result:结果:
group_id personal_ID score
0 77 149 17.0
0 77 819 22.0
1 37 821 18.0
1 37 359 24.0
1 37 089 17.0
1 37 170 15.0
2 51 280 18.0
2 51 261 18.0
2 51 628 20.0
2 51 722 21.0
2 51 744 19.0
2 51 152 19.0
3 84 140 19.0
Let's be lazy and try:让我们偷懒并尝试:
# this creates a lot of unnecessary NaN in the data
# and we use `stack` to kill them
# that's why we say `lazy`
(pd.DataFrame(df.set_index('group_id')['personal_score'].to_dict())
.stack()
.rename_axis(index=('personal_id','group_id'))
.reset_index(name='score')
)
Another way is to do a dict comprehension and concat:另一种方法是做一个 dict 理解和连接:
(pd.concat({x:pd.DataFrame({'score':y})
for x,y in zip(df['group_id'], df['personal_score'])
})
.rename_axis(['personal_id','group_id'])
.reset_index()
)
Output:输出:
personal_id group_id score
0 149 77 17.0
1 819 77 22.0
2 821 37 18.0
3 359 37 24.0
4 089 37 17.0
5 170 37 15.0
6 280 51 18.0
7 261 51 18.0
8 628 51 20.0
9 722 51 21.0
10 744 51 19.0
11 152 51 19.0
12 140 84 19.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.