I have a column in a pandas dataframe that consist of lists containing timestamps. I want to replace this list of timestamps with a list of the hour of each timestamp for every row. Below is an example
df = pd.DataFrame( {'id':[1,2], 'time':[ [2017-09-05 03:34:51,2016-03-07 05:24:55], [2016-02-06 03:14:21,2014-08-09 09:12:44, 2011-05-02 07:43:21] ] })
I would like a new column named 'hour' where
df['hour'] = [ [3,5], [3,9,7] ]
I tried different functionalities using map() and apply() but nothing produced the desired outcome, any help is very much appreciated.
Use apply
+ to_datetime
.
s = df.time.apply(lambda x: pd.to_datetime(x, errors='coerce').hour.tolist() )
s
0 [3, 5]
1 [3, 9, 7]
Name: time, dtype: object
df['hour'] = s
df
id time hour
0 1 [2017-09-05 03:34:51, 2016-03-07 05:24:55] [3, 5]
1 2 [2016-02-06 03:14:21, 2014-08-09 09:12:44, 201... [3, 9, 7]
Statutory warning, this is inefficient in general, because you have a column of lists.
If you want to know how I'd store this data, it'd be something like:
df
id time
0 1 2017-09-05 03:34:51
1 1 2016-03-07 05:24:55
2 2 2016-02-06 03:14:21
3 2 2014-08-09 09:12:44
4 2 2011-05-02 07:43:21
Now, getting the hour is as easy as:
h = pd.to_datetime(df.time).dt.hour
h
0 3
1 5
2 3
3 9
4 7
Name: time, dtype: int64
df['hour'] = h
If you want to perform group-wise computation, you can always use df.groupby
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.