[英]how to convert Date and time column of dataframe to datetime format of pandas?
[英]how to get non continuous date time in dataframe datetime column pandas
我有一個基於日期時間的 dataframe,如下所示,
timestamp value ... metric
36 2014-04-02 17:20:00 125.098263 ... 25.098263
14 2014-04-06 16:25:00 140.072787 ... 265.171050
10 2014-04-11 09:00:00 127.882020 ... 393.053070
45 2014-04-11 09:05:00 115.705719 ... 508.758789
24 2014-04-11 09:15:00 127.261178 ... 636.019967
17 2014-04-11 09:20:00 121.157997 ... 757.177965
49 2014-04-11 09:25:00 120.468468 ... 877.646433
8 2014-04-11 09:45:00 135.642696 ... 1013.289128
33 2014-04-11 09:55:00 125.210049 ... 1138.499178
19 2014-04-11 10:05:00 159.259713 ... 1297.758890
52 2014-04-11 10:20:00 150.082482 ... 1447.841373
我想創建名為“diff_col”的新列,其中包含“相同”或“差異”值。 如果日期不連續,則將其視為“差異”,否則為“相同”。 在上面的 dataframe 中,2014-04-02 17:20:00 和 2014-04-06 16:25:00 是與剩余日期時間值相比不同的日期。
如何創建 diff_col。
我試過了,df['diff_col']=df.groupby(pd.Grouper(key = 'timestamp', freq='1D'))
但它沒有正確創建預期的列。 我需要的 dataframe 如下,
timestamp value ... metric diff_col
36 2014-04-02 17:20:00 125.098263 ... 25.098263 diff
14 2014-04-06 16:25:00 140.072787 ... 265.171050 diff
10 2014-04-11 09:00:00 127.882020 ... 393.053070 same
45 2014-04-11 09:05:00 115.705719 ... 508.758789 same
24 2014-04-11 09:15:00 127.261178 ... 636.019967 same
17 2014-04-11 09:20:00 121.157997 ... 757.177965 same
49 2014-04-11 09:25:00 120.468468 ... 877.646433 same
8 2014-04-11 09:45:00 135.642696 ... 1013.289128 same
33 2014-04-11 09:55:00 125.210049 ... 1138.499178 same
19 2014-04-11 10:05:00 159.259713 ... 1297.758890 same
52 2014-04-11 10:20:00 150.082482 ... 1447.841373 same
請對此提出建議。
謝謝,庫馬爾
您可以比較連續的行以查看這是否是同一日期(使用dt.normalize
提取)並將其用作石斑魚以使用groupby.transform('size')
獲取大小,如果大小 > 1,則設置 'same ' else 'diff' 在numpy.where
的幫助下:
import numpy as np
# ensure datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
# get day
s = df['timestamp'].dt.normalize()
# compare successive rows and identify group size
df['diff_col'] = np.where(df.groupby(s.ne(s.shift()).cumsum())
.transform('size').gt(1),
'same', 'diff')
Output:
timestamp value ... metric diff_col
36 2014-04-02 17:20:00 125.098263 ... 25.098263 diff
14 2014-04-06 16:25:00 140.072787 ... 265.171050 diff
10 2014-04-11 09:00:00 127.882020 ... 393.053070 same
45 2014-04-11 09:05:00 115.705719 ... 508.758789 same
24 2014-04-11 09:15:00 127.261178 ... 636.019967 same
17 2014-04-11 09:20:00 121.157997 ... 757.177965 same
49 2014-04-11 09:25:00 120.468468 ... 877.646433 same
8 2014-04-11 09:45:00 135.642696 ... 1013.289128 same
33 2014-04-11 09:55:00 125.210049 ... 1138.499178 same
19 2014-04-11 10:05:00 159.259713 ... 1297.758890 same
52 2014-04-11 10:20:00 150.082482 ... 1447.841373 same
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.