Pandas 循环到 numpy。Numpy 将字符串在数组中的出现次数计数为非零

Question

Suppose I have the following dataframe with element types in brackets假设我有以下 dataframe，括号中包含元素类型

  Column1(int) Column2(str)  Column3(str)
0     2             02            34
1     2             34            02
2     2             80            85
3     2             91            09
4     2             09            34

When using pandas loops I use the following code.使用 pandas 循环时，我使用以下代码。 If Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4 :如果Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4 ：

import pandas as pd

for index in df.index:
    if df.loc[index, "Column"] == 2:
        df.loc[index, "Column4"] = df.loc[
            df.Column3 == df.loc[index, "Column2"], "Column3"
        ].count()

I am trying to use NumPy and array methods for efficiency.我正在尝试使用 NumPy 和数组方法来提高效率。 I have tried translating the method but no luck.我曾尝试翻译该方法，但没有成功。

import numpy as np

# turn Column3 to array
array = df.loc[:, "Column3"].values

index = df.index
df.assign(
    Column4=lambda x: np.where(
        (x["Column1"] == 2), np.count_nonzero(array == df.loc[index, "Column2"]), "F"
    )
)

Expected output预计 output

  Column1(int) Column2(str)  Column3(str)  Column4(int)
0     2             02            34           1
1     2             34            02           2
2     2             80            85           0
3     2             91            09           0
4     2             09            34           1

Answer 1

You can use pd.Series.value_counts on Column3 and use it as mapping for Column2 , you can pass Series object to pd.Series.map , missing values with pd.Series.fillna with 0您可以在Column3 pd.Series.value_counts其用作Column2的映射，您可以将Series object 传递给pd.Series.map ， pd.Series.fillna的缺失值为0

s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df.loc[df['Column1'].eq(2), 'Column4'] = s
df['Column4'] = df['Column4'].fillna('F') 
# Fills with 'F' where `Column1` is not equal to 2.

   Column1  Column2  Column3  Column4
0        2        2       34      1.0
1        2       34        2      2.0
2        2       80       85      0.0
3        2       91        9      0.0
4        2        9       34      1.0

Or you can use np.where here.或者你可以在这里使用np.where 。

s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df['Column4'] = np.where(df['Column1'].eq(2), s, 'F')

Pandas 循环到 numpy。Numpy 将字符串在数组中的出现次数计数为非零

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-25 17:07:55

Pandas 循环到 numpy。Numpy 将字符串在数组中的出现次数计数为非零

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-25 17:07:55

解决方案1
1 已采纳 2020-10-25 17:07:55