简体   繁体   English

用每行中每个时间戳的列表替换pandas数据框中的时间戳列表

[英]Replace a list of timestamps in a pandas dataframe with a list of our for every timestamp in every row

I have a column in a pandas dataframe that consist of lists containing timestamps. 我在pandas数据框中有一列,其中包含包含时间戳的列表。 I want to replace this list of timestamps with a list of the hour of each timestamp for every row. 我想将此时间戳列表替换为每一行每个时间戳小时的列表。 Below is an example 下面是一个例子

df = pd.DataFrame( {'id':[1,2], 'time':[ [2017-09-05 03:34:51,2016-03-07 05:24:55], [2016-02-06 03:14:21,2014-08-09 09:12:44, 2011-05-02 07:43:21] ] }) 

I would like a new column named 'hour' where 我想要一个名为“小时”的新列,

df['hour'] = [ [3,5], [3,9,7] ]

I tried different functionalities using map() and apply() but nothing produced the desired outcome, any help is very much appreciated. 我使用map()和apply()尝试了不同的功能,但没有任何方法产生期望的结果,非常感谢您的帮助。

Use apply + to_datetime . 使用apply + to_datetime

s = df.time.apply(lambda x: pd.to_datetime(x, errors='coerce').hour.tolist() )
s

0       [3, 5]
1    [3, 9, 7]
Name: time, dtype: object

df['hour'] = s
df

   id                                               time       hour
0   1         [2017-09-05 03:34:51, 2016-03-07 05:24:55]     [3, 5]
1   2  [2016-02-06 03:14:21, 2014-08-09 09:12:44, 201...  [3, 9, 7]

Statutory warning, this is inefficient in general, because you have a column of lists. 法定警告,这通常效率低下,因为您有一列列表。


If you want to know how I'd store this data, it'd be something like: 如果您想知道如何存储这些数据,它将类似于:

df

   id                 time
0   1  2017-09-05 03:34:51
1   1  2016-03-07 05:24:55
2   2  2016-02-06 03:14:21
3   2  2014-08-09 09:12:44
4   2  2011-05-02 07:43:21

Now, getting the hour is as easy as: 现在,获取时间很简单:

h = pd.to_datetime(df.time).dt.hour
h

0    3
1    5
2    3
3    9
4    7
Name: time, dtype: int64

df['hour'] = h

If you want to perform group-wise computation, you can always use df.groupby . 如果要执行逐组计算,则始终可以使用df.groupby

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM