python 如何将一个 pandas 数据帧中的值计数转置到第二个数据帧中的多列？

Question

I have 2 data frames df1 and df2.我有 2 个数据框 df1 和 df2。

import pandas as pd

df1 = pd.DataFrame({
    'id':['1','1','1','2','2','2', '3', '4','4', '5', '6', '7'],
    'group':['A','A','B', 'A', 'A', 'C', 'A', 'A', 'B', 'B', 'A', 'C']
})

df2 = pd.DataFrame({
'id':['1','2','3','4','5','6','7']
})

I want to add 3 columns to df2 named group_A, group_B, and group_C, where each counts the number of repetitions of each group in df1 according to the id column.我想在 df2 中添加 3 列，分别命名为 group_A、group_B 和 group_C，其中每个列根据 id 列计算 df1 中每个组的重复次数。 so the result of df2 should be likes this:所以 df2 的结果应该是这样的：

Answer 1

Use crosstab with DataFrame.join , type of both id has to by same, like here strings:使用带有DataFrame.join的crosstab ，两个id的类型必须相同，就像这里的字符串：

print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_'))
group  group_A  group_B  group_C
id                              
1            2        1        0
2            2        0        1
3            1        0        0
4            1        1        0
5            0        1        0
6            1        0        0
7            0        0        1
    
df = df2.join(pd.crosstab(df1['id'], df1['group']).add_prefix('group_'), on='id')
print (df)
  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Solution without join is possible, if same id s in both DataFrame s:如果两个DataFrame中的id相同，则无需连接的解决方案是可能的：

print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_').reset_index().rename_axis(None, axis=1))
  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Answer 2

One option is to get the counts for df2, before joining to df1:一种选择是在加入 df1 之前获取 df2 的计数：

counts = df1.value_counts().unstack(fill_value=0).add_prefix('group_')
df2.join(counts, on='id')

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Another option is with get_dummies , combined with groupby :另一种选择是使用get_dummies ，结合groupby ：

counts = pd.get_dummies(df1, columns = ['group']).groupby('id').sum()

df2.join(counts, on='id')

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Answer 3

Another option is groupby on ['id', 'group'] , apply size and unstack .另一个选项是['id', 'group']上的groupby ，应用size和unstack 。

out = (df1.groupby(['id','group']).size().unstack(fill_value=0)
       .add_prefix('group_').reset_index().rename_axis([None], axis=1)
       .merge(df2, on='id'))

Output: Output：

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

python 如何将一个 pandas 数据帧中的值计数转置到第二个数据帧中的多列？

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-12-20 06:33:14

解决方案2
0 2021-12-20 06:39:54

解决方案3
0 2021-12-20 07:12:42

python 如何将一个 pandas 数据帧中的值计数转置到第二个数据帧中的多列？

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-12-20 06:33:14

解决方案2 0 2021-12-20 06:39:54

解决方案3 0 2021-12-20 07:12:42

解决方案1
2 已采纳 2021-12-20 06:33:14

解决方案2
0 2021-12-20 06:39:54

解决方案3
0 2021-12-20 07:12:42