[英]Pandas loop to numpy . Numpy count occurrences of string as nonzero in array
Suppose I have the following dataframe with element types in brackets假设我有以下 dataframe,括号中包含元素类型
Column1(int) Column2(str) Column3(str)
0 2 02 34
1 2 34 02
2 2 80 85
3 2 91 09
4 2 09 34
When using pandas loops I use the following code.使用 pandas 循环时,我使用以下代码。 If Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4
:如果Column1 = 2, count how many times Column2 occurs in Column 3 and assign the count() to Column4
:
import pandas as pd
for index in df.index:
if df.loc[index, "Column"] == 2:
df.loc[index, "Column4"] = df.loc[
df.Column3 == df.loc[index, "Column2"], "Column3"
].count()
I am trying to use NumPy and array methods for efficiency.我正在尝试使用 NumPy 和数组方法来提高效率。 I have tried translating the method but no luck.我曾尝试翻译该方法,但没有成功。
import numpy as np
# turn Column3 to array
array = df.loc[:, "Column3"].values
index = df.index
df.assign(
Column4=lambda x: np.where(
(x["Column1"] == 2), np.count_nonzero(array == df.loc[index, "Column2"]), "F"
)
)
Expected output预计 output
Column1(int) Column2(str) Column3(str) Column4(int)
0 2 02 34 1
1 2 34 02 2
2 2 80 85 0
3 2 91 09 0
4 2 09 34 1
You can use pd.Series.value_counts
on Column3
and use it as mapping for Column2
, you can pass Series
object to pd.Series.map
, missing values with pd.Series.fillna
with 0
您可以在Column3
pd.Series.value_counts
其用作Column2
的映射,您可以将Series
object 传递给pd.Series.map
, pd.Series.fillna
的缺失值为0
s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df.loc[df['Column1'].eq(2), 'Column4'] = s
df['Column4'] = df['Column4'].fillna('F')
# Fills with 'F' where `Column1` is not equal to 2.
Column1 Column2 Column3 Column4
0 2 2 34 1.0
1 2 34 2 2.0
2 2 80 85 0.0
3 2 91 9 0.0
4 2 9 34 1.0
Or you can use np.where
here.或者你可以在这里使用np.where
。
s = df['Column2'].map(df['Column3'].value_counts()).fillna(0)
df['Column4'] = np.where(df['Column1'].eq(2), s, 'F')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.