Pandas Dataframe：如何添加在其他列中出现次数的列

Question

我必须遵循df：

Col1    Col2
test    Something
test2   Something
test3   Something
test    Something
test2   Something
test5   Something

我想得到

Col1    Col2          Occur
test    Something     2
test2   Something     2
test3   Something     1
test    Something     2
test2   Something     2
test5   Something     1

我试过使用：

df["Occur"] = df["Col1"].value_counts()

但它没有帮助。 我有一个充满“NaN”的 Occur 列

Answer 1

groupby on 'col1' 然后在Col2上应用transform以返回其索引与原始 df 对齐的系列，以便您可以将其添加为列：

In [3]:
df['Occur'] = df.groupby('Col1')['Col2'].transform(pd.Series.value_counts)
df

Out[3]:
    Col1       Col2 Occur
0   test  Something     2
1  test2  Something     2
2  test3  Something     1
3   test  Something     2
4  test2  Something     2
5  test5  Something     1

Answer 2

您还可以使用GroupBy + transform size ：

df['Occur'] = df.groupby('Col1')['Col1'].transform('size')

print(df)

    Col1       Col2  Occur
0   test  Something      2
1  test2  Something      2
2  test3  Something      1
3   test  Something      2
4  test2  Something      2
5  test5  Something      1

Answer 3

当我想保留更多的列而不仅仅是两列 Col1 和 Col2 时，我无法得到其他答案。 下面保留了任意数量的其他列对我来说效果很好。

df['Occur'] = df['Col1'].apply(lambda x: (df['Col1'] == x).sum())

Pandas Dataframe：如何添加在其他列中出现次数的列

问题描述

3 个解决方案

解决方案1
5 已采纳 2016-05-06 17:08:00

解决方案2
5 2018-09-10 09:41:09

解决方案3
0 2020-07-15 08:02:22

Pandas Dataframe：如何添加在其他列中出现次数的列

问题描述

3 个解决方案

解决方案1 5 已采纳 2016-05-06 17:08:00

解决方案2 5 2018-09-10 09:41:09

解决方案3 0 2020-07-15 08:02:22

解决方案1
5 已采纳 2016-05-06 17:08:00

解决方案2
5 2018-09-10 09:41:09

解决方案3
0 2020-07-15 08:02:22