简体   繁体   中英

Replace a list of timestamps in a pandas dataframe with a list of our for every timestamp in every row

I have a column in a pandas dataframe that consist of lists containing timestamps. I want to replace this list of timestamps with a list of the hour of each timestamp for every row. Below is an example

df = pd.DataFrame( {'id':[1,2], 'time':[ [2017-09-05 03:34:51,2016-03-07 05:24:55], [2016-02-06 03:14:21,2014-08-09 09:12:44, 2011-05-02 07:43:21] ] }) 

I would like a new column named 'hour' where

df['hour'] = [ [3,5], [3,9,7] ]

I tried different functionalities using map() and apply() but nothing produced the desired outcome, any help is very much appreciated.

Use apply + to_datetime .

s = df.time.apply(lambda x: pd.to_datetime(x, errors='coerce').hour.tolist() )
s

0       [3, 5]
1    [3, 9, 7]
Name: time, dtype: object

df['hour'] = s
df

   id                                               time       hour
0   1         [2017-09-05 03:34:51, 2016-03-07 05:24:55]     [3, 5]
1   2  [2016-02-06 03:14:21, 2014-08-09 09:12:44, 201...  [3, 9, 7]

Statutory warning, this is inefficient in general, because you have a column of lists.


If you want to know how I'd store this data, it'd be something like:

df

   id                 time
0   1  2017-09-05 03:34:51
1   1  2016-03-07 05:24:55
2   2  2016-02-06 03:14:21
3   2  2014-08-09 09:12:44
4   2  2011-05-02 07:43:21

Now, getting the hour is as easy as:

h = pd.to_datetime(df.time).dt.hour
h

0    3
1    5
2    3
3    9
4    7
Name: time, dtype: int64

df['hour'] = h

If you want to perform group-wise computation, you can always use df.groupby .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM