[英]How to create new Pandas column with total daily minutes calculated from timestamp?
[英]Pandas create new calssification column from timestamp
我正在嘗試創建一個新的分類列'Stages_So'
並將其發布到我的原始數據'Stages_So'
。
Event_Code Timestamp
2053 13/08/2016 11:30
1029 10/09/2016 14:00
2053 02/10/2016 13:15
2053 06/11/2016 16:30
2053 19/11/2016 15:00
2053 03/12/2016 17:30
1029 02/01/2017 15:00
1029 05/02/2017 16:00
2053 11/02/2017 15:00
1029 04/03/2017 15:00
2053 01/04/2017 14:00
1029 21/05/2017 14:00
我嘗試了以下功能。
def label_stage(row):
if row['Timestamp'] > '2016-08-12' and row['Timestamp'] < '2016-11-07':
return 0
if row['Timestamp'] > '2016-11-18' and row['Timestamp'] < '2017-02-06':
return 1
if row['Timestamp'] > '2017-02-10' and row['Timestamp'] < '2017-05-22':
return 2
df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
但它給出了一個錯誤。 TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 957')
。
您需要先通過to_datetime
將列轉換為datetimes,然后再通過datetime
進行比較:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
def label_stage(row):
if row['Timestamp'] > pd.Timestamp('2016-08-12') and
row['Timestamp'] < pd.Timestamp('2016-11-07'):
return 0
if row['Timestamp'] > pd.Timestamp('2016-11-18') and
row['Timestamp'] < pd.Timestamp('2017-02-06'):
return 1
if row['Timestamp'] > pd.Timestamp('2017-02-10') and
row['Timestamp'] < pd.Timestamp('2017-05-22'):
return 2
df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
print (df)
Event_Code Timestamp Stages_So
0 2053 2016-08-13 11:30:00 0.0
1 1029 2016-10-09 14:00:00 0.0
2 2053 2016-02-10 13:15:00 NaN
3 2053 2016-06-11 16:30:00 NaN
4 2053 2016-11-19 15:00:00 1.0
5 2053 2016-03-12 17:30:00 NaN
6 1029 2017-02-01 15:00:00 1.0
7 1029 2017-05-02 16:00:00 2.0
8 2053 2017-11-02 15:00:00 NaN
9 1029 2017-04-03 15:00:00 2.0
10 2053 2017-01-04 14:00:00 1.0
11 1029 2017-05-21 14:00:00 2.0
另一個更快的解決方案:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
m1 = (df['Timestamp'] > '2016-08-12') & (df['Timestamp'] < '2016-11-07')
m2 = (df['Timestamp'] > '2016-11-18') & (df['Timestamp'] < '2017-02-06')
m3 = (df['Timestamp'] > '2017-02-10') & (df['Timestamp'] < '2017-05-22')
df['Stages_So'] = np.select([m1, m2, m3], [0,1,2], default=np.nan)
print (df)
Event_Code Timestamp Stages_So
0 2053 2016-08-13 11:30:00 0.0
1 1029 2016-10-09 14:00:00 0.0
2 2053 2016-02-10 13:15:00 NaN
3 2053 2016-06-11 16:30:00 NaN
4 2053 2016-11-19 15:00:00 1.0
5 2053 2016-03-12 17:30:00 NaN
6 1029 2017-02-01 15:00:00 1.0
7 1029 2017-05-02 16:00:00 2.0
8 2053 2017-11-02 15:00:00 NaN
9 1029 2017-04-03 15:00:00 2.0
10 2053 2017-01-04 14:00:00 1.0
11 1029 2017-05-21 14:00:00 2.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.