Below I have two dataframes. The first dataframe (d1) has a 'Date' index, and the 2nd dataframe (d2) has a 'Date' and 'Name' index.
You'll notice that d1 starts at 2014-04-30 and d2 starts at 2014-01-31.
d1:
Value
Date
2014-04-30 1
2014-05-31 2
2014-06-30 3
2014-07-31 4
2014-08-31 5
2014-09-30 6
2014-10-31 7
d2:
Value
Date Name
2014-01-31 n1 5
2014-02-30 n1 6
2014-03-30 n1 7
2014-04-30 n1 8
2014-05-31 n2 9
2014-06-30 n2 3
2014-07-31 n2 4
2014-08-31 n2 5
2014-09-30 n2 6
2014-10-31 n2 7
What I want to do is to prepend the earlier dates from d2, but use the first value from the d1 to populate the value rows of the prepended rows.
The result should look like this:
Value
Date
2014-01-31 1
2014-02-30 1
2014-03-30 1
2014-04-30 1
2014-05-31 2
2014-06-30 3
2014-07-31 4
2014-08-31 5
2014-09-30 6
2014-10-31 7
What the most efficient or easiest way to do this using pandas
Probably not very elegant, but your df2
has MultiIndex
?:
df3 = pd.concat((df1, df2.reset_index().set_index('Date')), axis=1).fillna(method='backfill')
df3.index.name = 'Date'
print df3.set_index([df3.index, df3.Name], drop=True).icol([0])
Value
Date Name
2014-01-31 n1 1
2014-02-30 n1 1
2014-03-30 n1 1
2014-04-30 n1 1
2014-05-31 n2 2
2014-06-30 n2 3
2014-07-31 n2 4
2014-08-31 n2 5
2014-09-30 n2 6
2014-10-31 n2 7
This is a direct formulation of your problem, and it is quite fast already:
In [126]: def direct(d1, d2):
dates2 = d2.index.get_level_values('Date')
dates1 = d1.index
return d1.reindex(dates2[dates2 < min(dates1)].append(dates1), method='bfill')
.....:
In [127]: direct(d1, d2)
Out[127]:
Value
Date
2014-01-31 1
2014-02-28 1
2014-03-30 1
2014-04-30 1
2014-05-31 2
2014-06-30 3
2014-07-31 4
2014-08-31 5
2014-09-30 6
2014-10-31 7
In [128]: %timeit direct(d1, d2)
1000 loops, best of 3: 362 µs per loop
If you are willing to sacrifice some readability for performance, you could compare dates by their internal representation (integers are faster) and do the "backfilling" manually:
In [129]: def fast(d1, d2):
dates2 = d2.index.get_level_values('Date')
dates1 = d1.index
new_dates = dates2[dates2.asi8 < min(dates1.asi8)]
new_index = new_dates.append(dates1)
new_values = np.concatenate((np.repeat(d1.values[:1], len(new_dates), axis=0), d1.values))
return pd.DataFrame(new_values, index=new_index, columns=d1.columns, copy=False)
.....:
In [130]: %timeit fast(d1, d2)
1000 loops, best of 3: 213 µs per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.