如果最后 n 行為真，則 Python 數據框設置為真

Question

我想在其中創建一個新列，如果其他列中的最后 n 行為 True，則為 True。 它按照我的意願完美運行。 問題是它需要很多時間。

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]}) 
n=2 ## n to cover 10 min range samples 
cl_id = dfx.columns.tolist().index('A')  ### cl_id for index number of the column for using in .iloc 
l1=[False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]
dfx['B'] = l1
print(dfx)
   #old_col   # New_col
       A      B
0  False  False
1  False  False
2  False  False
3  False  False
4   True  False
5   True   True  ## Here A col last two rows True, hence True
6   True   True  ## Here A col last two rows True, hence True
7   True   True  ## Here A col last two rows True, hence True
8  False  False
9   True  False

有沒有更好的方法來做到這一點。 運行和提供輸出需要花費大量時間。

Answer 1

使用pandas.Series.rolling ：

n = 2
dfx["A"].rolling(n).sum().eq(n)

輸出：

0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8    False
9    False
Name: A, dtype: bool

對 OP 的基准測試（大約快 1000 倍）：

dfx = pd.DataFrame({'A':[False,False,False,False,True,True,True,True,False,True]*1000}) 

%timeit -n10 l1 = dfx["A"].rolling(n).sum().eq(n)
# 702 µs ± 88.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n10 l2 = [False]*n+[all(dfx.iloc[x+1-n:x+1,cl_id].tolist()) for x in np.arange(n,len(dfx))]
# 908 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

l1.tolist() == l2
# True

如果最后 n 行為真，則 Python 數據框設置為真

問題描述

1 個解決方案

解決方案1
1 已采納 2020-09-04 03:24:48

如果最后 n 行為真，則 Python 數據框設置為真

問題描述

1 個解決方案

解決方案1 1 已采納 2020-09-04 03:24:48

解決方案1
1 已采納 2020-09-04 03:24:48