[英]python How to transpose the count of values in one pandas data frame to multiple columns in a second data frame?
I have 2 data frames df1 and df2.我有 2 个数据框 df1 和 df2。
import pandas as pd
df1 = pd.DataFrame({
'id':['1','1','1','2','2','2', '3', '4','4', '5', '6', '7'],
'group':['A','A','B', 'A', 'A', 'C', 'A', 'A', 'B', 'B', 'A', 'C']
})
df2 = pd.DataFrame({
'id':['1','2','3','4','5','6','7']
})
I want to add 3 columns to df2 named group_A, group_B, and group_C, where each counts the number of repetitions of each group in df1 according to the id column.我想在 df2 中添加 3 列,分别命名为 group_A、group_B 和 group_C,其中每个列根据 id 列计算 df1 中每个组的重复次数。 so the result of df2 should be likes this:
所以 df2 的结果应该是这样的:
Use crosstab
with DataFrame.join
, type of both id
has to by same, like here strings:使用带有
DataFrame.join
的crosstab
,两个id
的类型必须相同,就像这里的字符串:
print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_'))
group group_A group_B group_C
id
1 2 1 0
2 2 0 1
3 1 0 0
4 1 1 0
5 0 1 0
6 1 0 0
7 0 0 1
df = df2.join(pd.crosstab(df1['id'], df1['group']).add_prefix('group_'), on='id')
print (df)
id group_A group_B group_C
0 1 2 1 0
1 2 2 0 1
2 3 1 0 0
3 4 1 1 0
4 5 0 1 0
5 6 1 0 0
6 7 0 0 1
Solution without join is possible, if same id
s in both DataFrame
s:如果两个
DataFrame
中的id
相同,则无需连接的解决方案是可能的:
print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_').reset_index().rename_axis(None, axis=1))
id group_A group_B group_C
0 1 2 1 0
1 2 2 0 1
2 3 1 0 0
3 4 1 1 0
4 5 0 1 0
5 6 1 0 0
6 7 0 0 1
One option is to get the counts for df2, before joining to df1:一种选择是在加入 df1 之前获取 df2 的计数:
counts = df1.value_counts().unstack(fill_value=0).add_prefix('group_')
df2.join(counts, on='id')
id group_A group_B group_C
0 1 2 1 0
1 2 2 0 1
2 3 1 0 0
3 4 1 1 0
4 5 0 1 0
5 6 1 0 0
6 7 0 0 1
Another option is with get_dummies
, combined with groupby
:另一种选择是使用
get_dummies
,结合groupby
:
counts = pd.get_dummies(df1, columns = ['group']).groupby('id').sum()
df2.join(counts, on='id')
id group_A group_B group_C
0 1 2 1 0
1 2 2 0 1
2 3 1 0 0
3 4 1 1 0
4 5 0 1 0
5 6 1 0 0
6 7 0 0 1
Another option is groupby
on ['id', 'group']
, apply size
and unstack
.另一个选项是
['id', 'group']
上的groupby
,应用size
和unstack
。
out = (df1.groupby(['id','group']).size().unstack(fill_value=0)
.add_prefix('group_').reset_index().rename_axis([None], axis=1)
.merge(df2, on='id'))
Output: Output:
id group_A group_B group_C
0 1 2 1 0
1 2 2 0 1
2 3 1 0 0
3 4 1 1 0
4 5 0 1 0
5 6 1 0 0
6 7 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.