简体   繁体   中英

Pandas loop to numpy . Numpy count occurrences of string as nonzero in array

Suppose I have the following dataframe with element types in brackets

  Column1(int) Column2(str)  Column3(str)
0     2             02            34
1     2             34            02
2     2             80            85
3     2             91            09
4     2             09            34

When using pandas loops I use the following code. If Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4 :

import pandas as pd

for index in df.index:
    if df.loc[index, "Column"] == 2:
        df.loc[index, "Column4"] = df.loc[
            df.Column3 == df.loc[index, "Column2"], "Column3"
        ].count()

I am trying to use NumPy and array methods for efficiency. I have tried translating the method but no luck.

import numpy as np

# turn Column3 to array
array = df.loc[:, "Column3"].values

index = df.index
df.assign(
    Column4=lambda x: np.where(
        (x["Column1"] == 2), np.count_nonzero(array == df.loc[index, "Column2"]), "F"
    )
)

Expected output

  Column1(int) Column2(str)  Column3(str)  Column4(int)
0     2             02            34           1
1     2             34            02           2
2     2             80            85           0
3     2             91            09           0
4     2             09            34           1

You can use pd.Series.value_counts on Column3 and use it as mapping for Column2 , you can pass Series object to pd.Series.map , missing values with pd.Series.fillna with 0

s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df.loc[df['Column1'].eq(2), 'Column4'] = s
df['Column4'] = df['Column4'].fillna('F') 
# Fills with 'F' where `Column1` is not equal to 2.

   Column1  Column2  Column3  Column4
0        2        2       34      1.0
1        2       34        2      2.0
2        2       80       85      0.0
3        2       91        9      0.0
4        2        9       34      1.0

Or you can use np.where here.

s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df['Column4'] = np.where(df['Column1'].eq(2), s, 'F')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM