[英]Pandas: add column with the most recent values
I have two pandas dataframes, both index with datetime entries. 我有两个熊猫数据框,都带有日期时间条目索引。 The df1
has non-unique time indices, whereas df2
has unique ones. df1
具有非唯一的时间索引,而df2
具有唯一的时间索引。 I would like to add a column df2.a
to df1
in the following way: for every row in df1
with timestamp ts
, df1.a
should contain the most recent value of df2.a
whose timestamp is less then ts
. 我想按以下方式向df1
添加一列df2.a
:对于df1
中时间戳为ts
每一行, df1.a
应该包含df2.a
的最新值,该值的时间戳小于ts
。
For example, let's say that df2
is sampled every minute, and there are rows with timestamps 08:00:15
, 08:00:47
, 08:02:35
in df1
. 例如,假设df2
每分钟进行一次采样,并且df1
有时间戳为08:00:15
: 08:00:47
: 08:02:35
08:00:15
: 08:00:47
: 08:02:35
行。 In this case I would like the value from df2.a[08:00:00]
to be used for the first two rows, and df2.a[08:02:00]
for the third. 在这种情况下,我希望将df2.a[08:00:00]
的值用于前两行,并将df2.a[08:02:00]
用于第三行。 How can I do this? 我怎样才能做到这一点?
您正在描述一个asof-join ,它刚刚在pandas 0.19中发布 。
pd.merge(df1, df2, left_on='ts', right_on='a')
适用于df1的行,并使用ffill在df2上重新编制索引。
df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.