I need to replace values in one dataframe by the mutual date index of another dataframe. Here there are specific dates (the 5th through the 10th), where column B's values need to be substituted with those in dataframe2. I've looked at merges / joins / replace / etc, but cannot find out how to do this.
import pandas as pd
import numpy as np
list1 = [10,80,6,38,41,54,12,280,46,21,46,22]
list2 = [4,3,22,6,'NA','NA','NA','NA','NA','NA',452,13]
list3 = ['2016-01-01', '2016-01-02','2016-01-03','2016-01-04','2016-01-05','2016-01-06',
'2016-01-07','2016-01-08','2016-01-09','2016-01-10','2016-01-11','2016-01-12',]
dat = pd.DataFrame({'A' : list1, 'B' : list2, 'Date' : list3}, columns = ['A', 'B', 'Date'])
dat['Date'] = pd.to_datetime(dat['Date'], format = '%Y-%m-%d')
dat = dat.set_index('Date')
print(dat)
Values 2016-01-05 to 2016-01-10
need to be replaced with values in the second dataframe:
A B
Date
2016-01-01 10 4
2016-01-02 80 3
2016-01-03 6 22
2016-01-04 38 6
2016-01-05 41 NA
2016-01-06 54 NA
2016-01-07 12 NA
2016-01-08 280 NA
2016-01-09 46 NA
2016-01-10 21 NA
2016-01-11 46 452
2016-01-12 22 13
Here is the second dataframe, where these values need to be "mapped" into the first dataframe:
list4 = [78,15,16,79,71,90]
list5 = ['2016-01-05','2016-01-06','2016-01-07','2016-01-08','2016-01-09','2016-01-10']
dat2 = pd.DataFrame({'B' : list4, 'Date' : list5}, columns = ['B', 'Date'])
dat2['Date'] = pd.to_datetime(dat2['Date'], format = '%Y-%m-%d')
dat2 = dat2.set_index('Date')
print(dat2)
B
Date
2016-01-05 78
2016-01-06 15
2016-01-07 16
2016-01-08 79
2016-01-09 71
2016-01-10 90
The final output should look like:
A B
Date
2016-01-01 10 4
2016-01-02 80 3
2016-01-03 6 22
2016-01-04 38 6
2016-01-05 41 78
2016-01-06 54 15
2016-01-07 12 16
2016-01-08 280 79
2016-01-09 46 71
2016-01-10 21 90
2016-01-11 46 452
2016-01-12 22 13
Any help would be greatly appreciated! Thank you.
One way using combine_first
df1 = dat2.combine_first(dat)
print (df1)
A B
Date
2016-01-01 10 4.0
2016-01-02 80 3.0
2016-01-03 6 22.0
2016-01-04 38 6.0
2016-01-05 41 78.0
2016-01-06 54 15.0
2016-01-07 12 16.0
2016-01-08 280 79.0
2016-01-09 46 71.0
2016-01-10 21 90.0
2016-01-11 46 452.0
2016-01-12 22 13.0
Or using DataFrame.update
dat.update(dat2)
Or you could also use .loc
dat.loc[dat2.index, 'B'] = dat2.loc[:, 'B']
You can update cells by their location (index and column) to precisely target what you update:
replace = [pd.to_datetime(d) for d in ['2016-01-05', '2016-01-10']
dat.loc[replace, 'B'] = dat2.loc[replace, 'B']
This ensures that you only touch the indices you expect, and only touch the columns you expect.
EDIT: Here is the documentation for that .loc
method. I'd give it a look, it's a very versatile tool.
EDIT2: I saw you're actually replacing a slice of time, not just those two values locations. This can also be achieved with .loc
:
start, end = pd.to_datetime('2016-01-05'), pd.to_datetime('2016-01-10')
dat.loc[start:end, 'B'] = dat2.loc[start:end, 'B']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.