簡體   English   中英

熊貓從時間戳創建新的可分類列

[英]Pandas create new calssification column from timestamp

我正在嘗試創建一個新的分類列'Stages_So'並將其發布到我的原始數據'Stages_So'

Event_Code Timestamp
2053    13/08/2016 11:30
1029    10/09/2016 14:00
2053    02/10/2016 13:15
2053    06/11/2016 16:30
2053    19/11/2016 15:00
2053    03/12/2016 17:30
1029    02/01/2017 15:00
1029    05/02/2017 16:00
2053    11/02/2017 15:00
1029    04/03/2017 15:00
2053    01/04/2017 14:00
1029    21/05/2017 14:00

我嘗試了以下功能。

def label_stage(row):
    if row['Timestamp'] > '2016-08-12' and row['Timestamp'] < '2016-11-07':
        return 0
    if row['Timestamp'] > '2016-11-18' and row['Timestamp'] < '2017-02-06':
        return 1
    if row['Timestamp'] > '2017-02-10' and row['Timestamp'] < '2017-05-22':
        return 2


df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)

但它給出了一個錯誤。 TypeError: ("Cannot compare type 'Timestamp' with type 'str'", 'occurred at index 957')

您需要先通過to_datetime將列轉換為datetimes,然后再通過datetime進行比較:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

def label_stage(row):
    if row['Timestamp'] > pd.Timestamp('2016-08-12') and 
       row['Timestamp'] < pd.Timestamp('2016-11-07'):
        return 0
    if row['Timestamp'] > pd.Timestamp('2016-11-18') and 
       row['Timestamp'] < pd.Timestamp('2017-02-06'):
        return 1
    if row['Timestamp'] > pd.Timestamp('2017-02-10') and 
       row['Timestamp'] < pd.Timestamp('2017-05-22'):
        return 2

df['Stages_So'] = df.apply(lambda row: label_stage(row), axis=1)
print (df)
    Event_Code           Timestamp  Stages_So
0         2053 2016-08-13 11:30:00        0.0
1         1029 2016-10-09 14:00:00        0.0
2         2053 2016-02-10 13:15:00        NaN
3         2053 2016-06-11 16:30:00        NaN
4         2053 2016-11-19 15:00:00        1.0
5         2053 2016-03-12 17:30:00        NaN
6         1029 2017-02-01 15:00:00        1.0
7         1029 2017-05-02 16:00:00        2.0
8         2053 2017-11-02 15:00:00        NaN
9         1029 2017-04-03 15:00:00        2.0
10        2053 2017-01-04 14:00:00        1.0
11        1029 2017-05-21 14:00:00        2.0

另一個更快的解決方案:

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

m1 = (df['Timestamp'] > '2016-08-12') & (df['Timestamp'] < '2016-11-07')
m2 = (df['Timestamp'] > '2016-11-18') & (df['Timestamp'] < '2017-02-06')
m3 = (df['Timestamp'] > '2017-02-10') & (df['Timestamp'] < '2017-05-22')

df['Stages_So'] = np.select([m1, m2, m3], [0,1,2], default=np.nan)
print (df)
    Event_Code           Timestamp  Stages_So
0         2053 2016-08-13 11:30:00        0.0
1         1029 2016-10-09 14:00:00        0.0
2         2053 2016-02-10 13:15:00        NaN
3         2053 2016-06-11 16:30:00        NaN
4         2053 2016-11-19 15:00:00        1.0
5         2053 2016-03-12 17:30:00        NaN
6         1029 2017-02-01 15:00:00        1.0
7         1029 2017-05-02 16:00:00        2.0
8         2053 2017-11-02 15:00:00        NaN
9         1029 2017-04-03 15:00:00        2.0
10        2053 2017-01-04 14:00:00        1.0
11        1029 2017-05-21 14:00:00        2.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM