[英]How to get the subset dataframe from the dataframe in python?
[英]python split dataframe subset with from to condition
我有 dataframe ,我必須在以下條件下拆分子集:
開始拆分:c = 1 結束拆分:c = -1
例子:
a b c
False False
False False -1
False False
True False 1 start first subset
False False
False False 1
False False 1
False False
False False 1
False False
False False 1
False False
False True -1 end of first subset
False False
False False
False True -1
False False
False False
True False 1 start second subset
False False -1 end of second subset
這可能是一個解決方案,盡管我不確定它是否是最有效的方法。 這基本上使用 cumsum 和一些和/或邏輯。
import pandas as pd
import numpy as np
df = pd.DataFrame({'c': [np.nan, 1, np.nan, 1, np.nan, np.nan,
-1, np.nan, np.nan, 1, np.nan, np.nan,
1, 1, np.nan, -1, -1, 1, -1]})
c
0 NaN
1 1.0
2 NaN
3 1.0
4 NaN
5 NaN
6 -1.0
7 NaN
8 NaN
9 1.0
10 NaN
11 NaN
12 1.0
13 1.0
14 NaN
15 -1.0
16 -1.0
17 1.0
18 -1.0
(
df
.assign(
start_end=lambda df: df.index.isin(
df
.loc[lambda df: df.c.isin([1,-1])]
.loc[lambda df: df.c.shift(1,fill_value=0)!=df.c]
.index),
start=lambda df: np.where(np.logical_and(df.start_end==True,df.c==1),1,0),
end=lambda df: np.where(np.logical_and(df.start_end==True,df.c==-1),1,0),
subset=lambda df: np.where(df.start.cumsum() != df.end.shift(1, fill_value=0).cumsum(),
df.start.cumsum(),
0)
)
.drop(columns=['start_end','start','end'])
)
c subset
0 NaN 0
1 1.0 1
2 NaN 1
3 1.0 1
4 NaN 1
5 NaN 1
6 -1.0 1
7 NaN 0
8 NaN 0
9 1.0 2
10 NaN 2
11 NaN 2
12 1.0 2
13 1.0 2
14 NaN 2
15 -1.0 2
16 -1.0 0
17 1.0 3
18 -1.0 3
```
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.