繁体   English   中英

遍历熊猫数据框中的行

[英]Loop through rows in pandas dataframe

我有两个数据框:一个只有公司名称和日期。 其他只有时间戳记。 如下图所示

    creationdate
0   2012-05-01 18:20:27.167000
1   2012-05-01 19:16:08.070000
2   2012-05-01 19:20:07.880000
3   2012-05-01 19:33:02.200000
4   2012-05-01 19:35:09.173000
5   2012-05-01 20:18:55.610000
6   2012-05-01 20:26:27.577000
7   2012-05-01 20:32:34.343000
8   2012-05-01 20:39:31.257000
9   2012-05-01 21:04:50.357000
10  2012-05-01 21:54:18.983000
11  2012-05-02 02:23:53.250000
12  2012-05-02 02:40:27.643000
13  2012-05-02 08:44:28.260000

   sitename        date
0    Google  2012-05-01
1    Google  2012-05-02
2    Google  2012-05-03
3    Google  2012-05-04
4    Google  2012-05-05
5    Google  2012-05-06
6    Google  2012-05-07
7    Google  2012-05-08
8    Google  2012-05-09
9    Google  2012-05-10

如何有效遍历第二个数据帧并从与第二个数据帧中的每个日期相对应的第一个数据帧中提取时间戳。

合并(内部联接)这两个数据框应该起作用:

In [96]: df1['date'] = pd.DatetimeIndex (df1.creationdate).date

In [97]: df2['date'] = pd.DatetimeIndex (df2.date).date

In [98]: df=df1.merge(df2, on='date', how='inner')

In [99]: df
Out[99]: 
                 creationdate        date sitename
0  2012-05-01 18:20:27.167000  2012-05-01   Google
1  2012-05-01 19:16:08.070000  2012-05-01   Google
2  2012-05-01 19:20:07.880000  2012-05-01   Google
3  2012-05-01 19:33:02.200000  2012-05-01   Google
4  2012-05-01 19:35:09.173000  2012-05-01   Google
5  2012-05-01 20:18:55.610000  2012-05-01   Google
6  2012-05-01 20:26:27.577000  2012-05-01   Google
7  2012-05-01 20:32:34.343000  2012-05-01   Google
8  2012-05-01 20:39:31.257000  2012-05-01   Google
9  2012-05-01 21:04:50.357000  2012-05-01   Google
10 2012-05-01 21:54:18.983000  2012-05-01   Google
11 2012-05-02 02:23:53.250000  2012-05-02   Google
12 2012-05-02 02:40:27.643000  2012-05-02   Google
13 2012-05-02 08:44:28.260000  2012-05-02   Google

然后你就可以在做分析df

In [100]: df['time_diff'] = df.creationdate.diff()

In [101]: df.time_diff
Out[101]: 
0                NaT
1    00:55:40.903000
2    00:03:59.810000
3    00:12:54.320000
4    00:02:06.973000
5    00:43:46.437000
6    00:07:31.967000
7    00:06:06.766000
8    00:06:56.914000
9    00:25:19.100000
10   00:49:28.626000
11   04:29:34.267000
12   00:16:34.393000
13   06:04:00.617000
Name: time_diff, dtype: timedelta64[ns]

当然,您的creationdate必须为datetime64[ns] NOT STRING。 或者您需要转换pd.DatetimeIndex (df.creationdate)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM