[英]Pandas get postion of last value based on condition for each column (efficiently)
I want to get the information in which row the value 1
occurs last for each column of my dataframe. Given this last row index I want to calculate the "recency" of the occurence.我想获取我的 dataframe 的每一列中值
1
最后出现在哪一行的信息。鉴于最后一行索引,我想计算出现的“新近度”。 Like so:像这样:
>> df = pandas.DataFrame({"a":[0,0,1,0,0]," b":[1,1,1,1,1],"c":[1,0,0,0,1],"d":[0,0,0,0,0]})
>> df
a b c d
0 0 1 1 0
1 0 1 0 0
2 1 1 0 0
3 0 1 0 0
4 0 1 1 0
Desired result:期望的结果:
>> calculate_recency_vector(df)
[3,1,1,None]
The desired result shows for each column "how many rows ago" the value 1
appeared for the last time.期望的结果显示每列“多少行之前”最后一次出现值
1
。 Eg for the column a
the value 1
appears last in the 3rd-last row, hence the recency of 3
in the result vector.例如,对于列
a
,值1
最后出现在倒数第三行,因此结果向量中的新近度为3
。 Any ideas how to implement this?任何想法如何实现这个?
Edit: to avoid confusion, I changed the desired output for the last column from 0
to None
.编辑:为避免混淆,我将最后一列所需的 output 从
0
更改为None
。 This column has no recency because the value 1
does not occur at all.此列没有新近度,因为根本没有出现值
1
。
Edit II: Thanks for the great answers.编辑二:感谢您的精彩回答。 I have to calculate this recency vector approx, 150k times on dataframes shaped (42.250).
我必须在形状为 (42.250) 的数据帧上计算这个近因向量大约 150k 次。 A more efficient solution would be much appreciated.
更有效的解决方案将不胜感激。
A loop-less solution which is faster & cleaner:更快更清洁的无环路解决方案:
>> def calculate_recency_for_one_column(column: pd.Series) -> int:
>> non_zero_values_of_col = column[column.astype(bool)]
>> if non_zero_values_of_col.empty:
>> return 0
>> return len(column) - non_zero_values_of_col.index[-1]
>> df = pd.DataFrame({"a":[0,0,1,0,0],"b":[1,1,1,1,1],"c":[1,0,0,0,1],"d":[0,0,0,0,0]})
>> df.apply(lambda column: calculate_recency_for_one_column(column),axis=0)
a 3
b 1
c 1
d 0
dtype: int64
Sidenote: Using pd.apply()
is slow ( SO explanation ).旁注:使用
pd.apply()
很慢(如此解释)。 There exist faster solutions like using np.where
or using apply(...,raw=True)
.存在更快的解决方案,例如使用
np.where
或使用apply(...,raw=True)
。 See this question for details.有关详细信息,请参阅此问题。
With this example dataframe, you can define a function as follow:使用此示例 dataframe,您可以定义一个 function,如下所示:
def calculate_recency_vector(df: pd.DataFrame, condition: int) -> list:
recency_vector = []
for col in df.columns:
last = 0
for i, y in enumerate(df[col].to_list()):
if y == condition:
last = i
recency = len(df[col].to_list()) - last
if recency == len(df[col].to_list()):
recency = None
recency_vector.append(recency)
return recency_vector
Running the function, it will return this:运行 function,它将返回:
calculate_recency_vector(df, 1)
[3, 1, 1, None]
One direct approach is to implement this function would be to use a loop to iterate through each column in the DataFrame, and within that loop, use another loop to iterate through each row in the column.实现此 function 的一种直接方法是使用循环遍历 DataFrame 中的每一列,并在该循环内使用另一个循环遍历列中的每一行。 For each row, check if the value is 1. If it is, update a variable to store the len(df[column])-index.
对于每一行,检查值是否为 1。如果是,则更新变量以存储 len(df[column])-index。 After the inner loop finishes, return the stored value as the recency for that column.
内部循环完成后,返回存储的值作为该列的新近度。 If 1 never appears in the column, return None.
如果 1 从未出现在列中,则返回 None。
import pandas
def calculate_recency_vector(df):
recency_vector = []
for column in df:
last_occurrence = None
for index, value in df[column].iteritems():
if value == 1:
last_occurrence =len(df[column])-index
recency_vector.append(last_occurrence)
return recency_vector
df = pandas.DataFrame({"a":[0,0,1,0,0]," b":[1,1,1,1,1],"c":[1,0,0,0,1],"d":[0,0,0,0,0]})
print(calculate_recency_vector(df))
This这个
df = pandas.DataFrame({"a":[0,0,1,0,0]," b":[1,1,1,1,1],"c":[1,0,0,0,1],"d":[0,0,0,0,0]})
df.apply(lambda x : ([df.shape[0] - i for i ,v in x.items() if v==1] or [None])[-1], axis=0)
produces the desired output as a pd.Series
, with the only diffrence that the result is float and None
is replaced by pandas Nan
, u could then take the desired column产生所需的 output 作为
pd.Series
,唯一的区别是结果是 float 而None
被 pandas Nan
取代,然后你可以采用所需的列
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.