简体   繁体   中英

Pandas: add column with the most recent values

I have two pandas dataframes, both index with datetime entries. The df1 has non-unique time indices, whereas df2 has unique ones. I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts , df1.a should contain the most recent value of df2.a whose timestamp is less then ts .

For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15 , 08:00:47 , 08:02:35 in df1 . In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. How can I do this?

您正在描述一个asof-join ,它刚刚在pandas 0.19中发布

pd.merge(df1, df2, left_on='ts', right_on='a')

适用于df1的行,并使用ffill在df2上重新编制索引。

df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM