简体   繁体   English

通过日期索引替换两个数据框中的值-Python Pandas

[英]Replacing Values in Two Dataframes by Date Index - Python Pandas

I need to replace values in one dataframe by the mutual date index of another dataframe. 我需要用另一个数据框的相互日期索引替换一个数据框中的值。 Here there are specific dates (the 5th through the 10th), where column B's values need to be substituted with those in dataframe2. 这里有特定的日期(5号到10号),其中列B的值需要替换为dataframe2中的值。 I've looked at merges / joins / replace / etc, but cannot find out how to do this. 我已经看过合并/联接/替换/等,但是找不到如何做到这一点。

import pandas as pd
import numpy as np

list1 = [10,80,6,38,41,54,12,280,46,21,46,22]
list2 = [4,3,22,6,'NA','NA','NA','NA','NA','NA',452,13]
list3 = ['2016-01-01', '2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06',
         '2016-01-07','2016-01-08','2016-01-09','2016-01-10','2016-01-11','2016-01-12',]

dat = pd.DataFrame({'A' : list1, 'B' : list2, 'Date' : list3}, columns = ['A', 'B', 'Date'])
dat['Date'] = pd.to_datetime(dat['Date'], format = '%Y-%m-%d')
dat = dat.set_index('Date')
print(dat)

Values 2016-01-05 to 2016-01-10 need to be replaced with values in the second dataframe: 需要用第二个数据框中的值替换值2016-01-05 to 2016-01-10

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   NA
2016-01-06   54   NA
2016-01-07   12   NA
2016-01-08  280   NA
2016-01-09   46   NA
2016-01-10   21   NA
2016-01-11   46  452
2016-01-12   22   13

Here is the second dataframe, where these values need to be "mapped" into the first dataframe: 这是第二个数据帧,其中这些值需要“映射”到第一个数据帧:

list4 = [78,15,16,79,71,90]
list5 = ['2016-01-05','2016-01-06','2016-01-07','2016-01-08','2016-01-09','2016-01-10']
dat2 = pd.DataFrame({'B' : list4, 'Date' : list5}, columns = ['B', 'Date'])
dat2['Date'] = pd.to_datetime(dat2['Date'], format = '%Y-%m-%d')
dat2 = dat2.set_index('Date')
print(dat2)

             B
Date          
2016-01-05  78
2016-01-06  15
2016-01-07  16
2016-01-08  79
2016-01-09  71
2016-01-10  90

The final output should look like: 最终输出应如下所示:

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   78
2016-01-06   54   15
2016-01-07   12   16
2016-01-08  280   79
2016-01-09   46   71
2016-01-10   21   90
2016-01-11   46  452
2016-01-12   22   13

Any help would be greatly appreciated! 任何帮助将不胜感激! Thank you. 谢谢。

One way using combine_first 一种使用combine_first

df1 = dat2.combine_first(dat)

print (df1)

            A    B
Date        
2016-01-01  10  4.0
2016-01-02  80  3.0
2016-01-03  6   22.0
2016-01-04  38  6.0
2016-01-05  41  78.0
2016-01-06  54  15.0
2016-01-07  12  16.0
2016-01-08  280 79.0
2016-01-09  46  71.0
2016-01-10  21  90.0
2016-01-11  46  452.0
2016-01-12  22  13.0

Or using DataFrame.update 或使用DataFrame.update

dat.update(dat2)

Or you could also use .loc 或者您也可以使用.loc

dat.loc[dat2.index, 'B'] = dat2.loc[:, 'B'] 

You can update cells by their location (index and column) to precisely target what you update: 您可以通过其位置(索引和列)来更新单元格,以精确地定位要更新的内容:

replace = [pd.to_datetime(d) for d in ['2016-01-05', '2016-01-10']
dat.loc[replace, 'B'] = dat2.loc[replace, 'B']

This ensures that you only touch the indices you expect, and only touch the columns you expect. 这样可以确保您仅触摸期望的索引,并且仅触摸期望的列。

EDIT: Here is the documentation for that .loc method. 编辑: 是该.loc方法的文档。 I'd give it a look, it's a very versatile tool. 我来看一下,它是一种非常通用的工具。

EDIT2: I saw you're actually replacing a slice of time, not just those two values locations. EDIT2:我看到您实际上是在替换一部分时间,而不仅仅是这两个值的位置。 This can also be achieved with .loc : 这也可以通过.loc来实现:

start, end = pd.to_datetime('2016-01-05'), pd.to_datetime('2016-01-10')
dat.loc[start:end, 'B'] = dat2.loc[start:end, 'B']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM