简体   繁体   中英

Replacing Values in Two Dataframes by Date Index - Python Pandas

I need to replace values in one dataframe by the mutual date index of another dataframe. Here there are specific dates (the 5th through the 10th), where column B's values need to be substituted with those in dataframe2. I've looked at merges / joins / replace / etc, but cannot find out how to do this.

import pandas as pd
import numpy as np

list1 = [10,80,6,38,41,54,12,280,46,21,46,22]
list2 = [4,3,22,6,'NA','NA','NA','NA','NA','NA',452,13]
list3 = ['2016-01-01', '2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06',
         '2016-01-07','2016-01-08','2016-01-09','2016-01-10','2016-01-11','2016-01-12',]

dat = pd.DataFrame({'A' : list1, 'B' : list2, 'Date' : list3}, columns = ['A', 'B', 'Date'])
dat['Date'] = pd.to_datetime(dat['Date'], format = '%Y-%m-%d')
dat = dat.set_index('Date')
print(dat)

Values 2016-01-05 to 2016-01-10 need to be replaced with values in the second dataframe:

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   NA
2016-01-06   54   NA
2016-01-07   12   NA
2016-01-08  280   NA
2016-01-09   46   NA
2016-01-10   21   NA
2016-01-11   46  452
2016-01-12   22   13

Here is the second dataframe, where these values need to be "mapped" into the first dataframe:

list4 = [78,15,16,79,71,90]
list5 = ['2016-01-05','2016-01-06','2016-01-07','2016-01-08','2016-01-09','2016-01-10']
dat2 = pd.DataFrame({'B' : list4, 'Date' : list5}, columns = ['B', 'Date'])
dat2['Date'] = pd.to_datetime(dat2['Date'], format = '%Y-%m-%d')
dat2 = dat2.set_index('Date')
print(dat2)

             B
Date          
2016-01-05  78
2016-01-06  15
2016-01-07  16
2016-01-08  79
2016-01-09  71
2016-01-10  90

The final output should look like:

              A    B
Date                
2016-01-01   10    4
2016-01-02   80    3
2016-01-03    6   22
2016-01-04   38    6
2016-01-05   41   78
2016-01-06   54   15
2016-01-07   12   16
2016-01-08  280   79
2016-01-09   46   71
2016-01-10   21   90
2016-01-11   46  452
2016-01-12   22   13

Any help would be greatly appreciated! Thank you.

One way using combine_first

df1 = dat2.combine_first(dat)

print (df1)

            A    B
Date        
2016-01-01  10  4.0
2016-01-02  80  3.0
2016-01-03  6   22.0
2016-01-04  38  6.0
2016-01-05  41  78.0
2016-01-06  54  15.0
2016-01-07  12  16.0
2016-01-08  280 79.0
2016-01-09  46  71.0
2016-01-10  21  90.0
2016-01-11  46  452.0
2016-01-12  22  13.0

Or using DataFrame.update

dat.update(dat2)

Or you could also use .loc

dat.loc[dat2.index, 'B'] = dat2.loc[:, 'B'] 

You can update cells by their location (index and column) to precisely target what you update:

replace = [pd.to_datetime(d) for d in ['2016-01-05', '2016-01-10']
dat.loc[replace, 'B'] = dat2.loc[replace, 'B']

This ensures that you only touch the indices you expect, and only touch the columns you expect.

EDIT: Here is the documentation for that .loc method. I'd give it a look, it's a very versatile tool.

EDIT2: I saw you're actually replacing a slice of time, not just those two values locations. This can also be achieved with .loc :

start, end = pd.to_datetime('2016-01-05'), pd.to_datetime('2016-01-10')
dat.loc[start:end, 'B'] = dat2.loc[start:end, 'B']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM