熊猫：添加具有最新值的列

Question

I have two pandas dataframes, both index with datetime entries. 我有两个熊猫数据框，都带有日期时间条目索引。 The df1 has non-unique time indices, whereas df2 has unique ones. df1具有非唯一的时间索引，而df2具有唯一的时间索引。 I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts , df1.a should contain the most recent value of df2.a whose timestamp is less then ts . 我想按以下方式向df1添加一列df2.a ：对于df1中时间戳为ts每一行， df1.a应该包含df2.a的最新值，该值的时间戳小于ts 。

For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15 , 08:00:47 , 08:02:35 in df1 . 例如，假设df2每分钟进行一次采样，并且df1有时间戳为08:00:15 : 08:00:47 : 08:02:35 08:00:15 : 08:00:47 : 08:02:35行。 In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. 在这种情况下，我希望将df2.a[08:00:00]的值用于前两行，并将df2.a[08:02:00]用于第三行。 How can I do this? 我怎样才能做到这一点？

Answer 1

您正在描述一个asof-join ，它刚刚在pandas 0.19中发布。

pd.merge(df1, df2, left_on='ts', right_on='a')

Answer 2

适用于df1的行，并使用ffill在df2上重新编制索引。

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)

熊猫：添加具有最新值的列

问题描述

2 个解决方案

解决方案1
2 2016-06-30 19:25:44

解决方案2
1 已采纳 2016-06-30 19:28:36

熊猫：添加具有最新值的列

问题描述

2 个解决方案

解决方案1 2 2016-06-30 19:25:44

解决方案2 1 已采纳 2016-06-30 19:28:36

解决方案1
2 2016-06-30 19:25:44

解决方案2
1 已采纳 2016-06-30 19:28:36