[英]Slice dataset on series by condition
我有數據集:
data = {'host': ['A','A','A','A','A','A','B','B','B','B','B','B'],
'TS': ['1','2', '3', '7', '9','11','7','8','9','14','16', '18'],
'Predict' : ['None','None', '134','None','None', '127','None','None', '121','None','None', '124']}
我想按非無值系列拆分數據集並獲取該系列的時間差。
我有 function 的時差。 並嘗試為系列提取索引,但不知道它是如何使用的
def timediffs(series):
series['tdiff'] = series['ts'].diff().fillna(0.0)
return series
predict_index = df.index.where(df['Predict'].notna()).to_series().bfill()
最后,我想得到這樣的數據集:
new_data = {'host': ['A','A','A','A','A','A','B','B','B','B','B','B'],
'TS': ['1','2', '3', '7', '9','11','7','8','9','14','16', '19'],
'Predict' : ['None','None', '134','None','None', '127','None','None', '121','None','None', '124'],
'Time_diff' : ['0','1','1','0','2','2', '0','1','1','0','2','3',],
'New_predict' : ['134','134','134','127','127','127','121','121','121','124','124','124',]
}
new_df = pd.DataFrame(new_data)
首先,我們將'None'
替換為NaN
。 然后我們使用backfill (bfill)
來制作我們的列, New_predict
,最后我們使用GroupBy.diff
來獲取Time_diff
:
df['New_predict'] = df.replace('None', np.NaN).loc[:, 'Predict'].bfill()
# df['TS'] = df['TS'].astype(int)
df['Time_diff'] = df.groupby('New_predict')['TS'].diff().fillna(0)
host TS Predict New_predict Time_diff
0 A 1 None 134 0.0
1 A 2 None 134 1.0
2 A 3 134 134 1.0
3 A 7 None 127 0.0
4 A 9 None 127 2.0
5 A 11 127 127 2.0
6 B 7 None 121 0.0
7 B 8 None 121 1.0
8 B 9 121 121 1.0
9 B 14 None 124 0.0
10 B 16 None 124 2.0
11 B 18 124 124 2.0
在您的示例數據中,首先需要預處理數據 - 將TS
轉換為數字並將字符串中的None
Predict
為NaN
或 Nonetype:
df['TS'] = df['TS'].astype(int)
df['Predict'] = pd.to_numeric(df['Predict'], errors='coerce')
#if need replace strings None to NaN
#df['Predict'] = df['Predict'].mask(df['Predict'] == 'None')
然后僅在Predict
列中回填缺失的數據,對於Time_diff
使用DataFrameGroupBy.diff
並將第一個值替換為0
:
df['New_predict'] = df['Predict'].bfill()
df['Time_diff'] = df.groupby('New_predict')['TS'].diff().fillna(0).astype(int)
print (df)
host TS Predict New_predict Time_diff
0 A 1 NaN 134.0 0
1 A 2 NaN 134.0 1
2 A 3 134.0 134.0 1
3 A 7 NaN 127.0 0
4 A 9 NaN 127.0 2
5 A 11 127.0 127.0 2
6 B 7 NaN 121.0 0
7 B 8 NaN 121.0 1
8 B 9 121.0 121.0 1
9 B 14 NaN 124.0 0
10 B 16 NaN 124.0 2
11 B 18 124.0 124.0 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.