Pandas 中按行计算系列值出现次数的有效方法

Question

I have a large dataframe for which I want to count the number of occurrences of a series specific values (given by an external function) by row.我有一个很大的 dataframe，我想按行计算一系列特定值（由外部函数给出）的出现次数。 For reproducibility let's assume the following simplified dataframe:为了再现性，我们假设以下简化的 dataframe：

data = {'A': [3, 2, 1, 0], 'B': [4, 3, 2, 1], 'C': [1, 2, 3, 4], 'D': [1, 1, 2, 2], 'E': [4, 4, 4, 4]}
df = pd.DataFrame.from_dict(data)
df
   A  B  C  D  E
0  3  4  1  1  4
1  2  3  2  1  3
2  1  2  3  2  2
3  0  1  4  2  4

How can I count the number of occurrences of specific values (given by a series with the same size) by row?如何按行计算特定值（由具有相同大小的系列给出）的出现次数？

Again for simplicity, let's assume this value_series is given by the max of each row.再次为简单起见，我们假设此value_series由每行的最大值给出。

values_series = df.max(axis=1)
0    4
1    3
2    3
3    4
dtype: int64

The solution I got to seems not very pythonic (eg I'm using iterrows(), which is slow):我得到的解决方案似乎不是很pythonic（例如我正在使用iterrows（），它很慢）：

max_count = []
for index, row in df.iterrows():
    max_count.append(row.value_counts()[values_series.loc[index]])
df_counts = pd.Series(max_count)

Is there any more efficient way to do this?有没有更有效的方法来做到这一点？

Answer 1

We can compare the transposed df.T directly to the df.max series, thanks to broadcasting:由于广播，我们可以将转置的df.T直接与df.max系列进行比较：

(df.T == df.max(axis=1)).sum()

# result
0    2
1    1
2    1
3    2
dtype: int64

(Transposing also has the added benefit that we can use sum without specifying the axis, ie with the default axis=0 .) （转置还有一个额外的好处，我们可以在不指定轴的情况下使用sum ，即默认axis=0 。）

Answer 2

You can try你可以试试

df.eq(df.max(1),axis=0).sum(1)
Out[361]: 
0    2
1    1
2    1
3    2
dtype: int64

Answer 3

The perfect job for numpy broadcasting: numpy广播的完美工作：

a = df.to_numpy()
b = values_series.to_numpy()[:, None]

(a == b).sum(axis=1)

Pandas 中按行计算系列值出现次数的有效方法

问题描述

3 个解决方案

解决方案1
3 已采纳 2022-04-16 12:00:43

解决方案2
3 2022-04-16 12:14:19

解决方案3
2 2022-04-16 11:58:07

Pandas 中按行计算系列值出现次数的有效方法

问题描述

3 个解决方案

解决方案1 3 已采纳 2022-04-16 12:00:43

解决方案2 3 2022-04-16 12:14:19

解决方案3 2 2022-04-16 11:58:07

解决方案1
3 已采纳 2022-04-16 12:00:43

解决方案2
3 2022-04-16 12:14:19

解决方案3
2 2022-04-16 11:58:07