简体   繁体   English

python 如何将一个 pandas 数据帧中的值计数转置到第二个数据帧中的多列?

[英]python How to transpose the count of values in one pandas data frame to multiple columns in a second data frame?

I have 2 data frames df1 and df2.我有 2 个数据框 df1 和 df2。

import pandas as pd

df1 = pd.DataFrame({
    'id':['1','1','1','2','2','2', '3', '4','4', '5', '6', '7'],
    'group':['A','A','B', 'A', 'A', 'C', 'A', 'A', 'B', 'B', 'A', 'C']
})

df2 = pd.DataFrame({
'id':['1','2','3','4','5','6','7']
})

I want to add 3 columns to df2 named group_A, group_B, and group_C, where each counts the number of repetitions of each group in df1 according to the id column.我想在 df2 中添加 3 列,分别命名为 group_A、group_B 和 group_C,其中每个列根据 id 列计算 df1 中每个组的重复次数。 so the result of df2 should be likes this:所以 df2 的结果应该是这样的:

示例输出

Use crosstab with DataFrame.join , type of both id has to by same, like here strings:使用带有DataFrame.joincrosstab ,两个id的类型必须相同,就像这里的字符串:

print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_'))
group  group_A  group_B  group_C
id                              
1            2        1        0
2            2        0        1
3            1        0        0
4            1        1        0
5            0        1        0
6            1        0        0
7            0        0        1
    
df = df2.join(pd.crosstab(df1['id'], df1['group']).add_prefix('group_'), on='id')
print (df)
  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Solution without join is possible, if same id s in both DataFrame s:如果两个DataFrame中的id相同,则无需连接的解决方案是可能的:

print (pd.crosstab(df1['id'], df1['group']).add_prefix('group_').reset_index().rename_axis(None, axis=1))
  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

One option is to get the counts for df2, before joining to df1:一种选择是在加入 df1 之前获取 df2 的计数:

counts = df1.value_counts().unstack(fill_value=0).add_prefix('group_')
df2.join(counts, on='id')

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Another option is with get_dummies , combined with groupby :另一种选择是使用get_dummies ,结合groupby

counts = pd.get_dummies(df1, columns = ['group']).groupby('id').sum()

df2.join(counts, on='id')

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

Another option is groupby on ['id', 'group'] , apply size and unstack .另一个选项是['id', 'group']上的groupby ,应用sizeunstack

out = (df1.groupby(['id','group']).size().unstack(fill_value=0)
       .add_prefix('group_').reset_index().rename_axis([None], axis=1)
       .merge(df2, on='id'))

Output: Output:

  id  group_A  group_B  group_C
0  1        2        1        0
1  2        2        0        1
2  3        1        0        0
3  4        1        1        0
4  5        0        1        0
5  6        1        0        0
6  7        0        0        1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 3:转置 Pandas 数据帧/“熔化”数据帧的列 - Python 3: Transpose columns of Pandas Data Frame / "melt" data frame 如何根据条件统计所有数据框列值并将列转置为 Python 中的行 - How to count all data frame column values based on condition and transpose the columns into rows in Python python - 如何在python pandas中分组并取一列的计数除以数据框第二列的唯一计数? - How to do group by and take Count of one column divide by count of unique of second column of data frame in python pandas? Python Pandas 数据帧 基于另一列的计数值 - Python Pandas Data Frame Count values of one column based on another 计算 Pandas 数据帧中的值-Python - Count values in Pandas data Frame -Python 如何根据合并的数据框之一的两列的值在熊猫数据框中添加值 - How to add values in a pandas data frame based on values of two columns of one of the data frame merged Python 中的 Pandas 数据框。 比例和转置 - Pandas Data Frame in Python. Proportions and Transpose 如何根据条件在熊猫数据框的多列上分配值 - How to assign values on multiple columns of a pandas data frame based on condition 熊猫,如何使用多个分组列替换python数据框中的平均值 - Pandas, how to replace mean values in python data frame using multiple grouped columns 如何按列和值转置熊猫数据帧? - How to transpose pandas data frame by a column and value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM