Suppose I have the following dataframe with element types in brackets
Column1(int) Column2(str) Column3(str)
0 2 02 34
1 2 34 02
2 2 80 85
3 2 91 09
4 2 09 34
When using pandas loops I use the following code. If Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4
:
import pandas as pd
for index in df.index:
if df.loc[index, "Column"] == 2:
df.loc[index, "Column4"] = df.loc[
df.Column3 == df.loc[index, "Column2"], "Column3"
].count()
I am trying to use NumPy and array methods for efficiency. I have tried translating the method but no luck.
import numpy as np
# turn Column3 to array
array = df.loc[:, "Column3"].values
index = df.index
df.assign(
Column4=lambda x: np.where(
(x["Column1"] == 2), np.count_nonzero(array == df.loc[index, "Column2"]), "F"
)
)
Expected output
Column1(int) Column2(str) Column3(str) Column4(int)
0 2 02 34 1
1 2 34 02 2
2 2 80 85 0
3 2 91 09 0
4 2 09 34 1
You can use pd.Series.value_counts
on Column3
and use it as mapping for Column2
, you can pass Series
object to pd.Series.map
, missing values with pd.Series.fillna
with 0
s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df.loc[df['Column1'].eq(2), 'Column4'] = s
df['Column4'] = df['Column4'].fillna('F')
# Fills with 'F' where `Column1` is not equal to 2.
Column1 Column2 Column3 Column4
0 2 2 34 1.0
1 2 34 2 2.0
2 2 80 85 0.0
3 2 91 9 0.0
4 2 9 34 1.0
Or you can use np.where
here.
s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df['Column4'] = np.where(df['Column1'].eq(2), s, 'F')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.