简体   繁体   中英

Prepend values to Panda's dataframe based on index level of another dataframe

Below I have two dataframes. The first dataframe (d1) has a 'Date' index, and the 2nd dataframe (d2) has a 'Date' and 'Name' index.
You'll notice that d1 starts at 2014-04-30 and d2 starts at 2014-01-31.

d1:

            Value
Date              
2014-04-30      1
2014-05-31      2
2014-06-30      3
2014-07-31      4
2014-08-31      5
2014-09-30      6
2014-10-31      7

d2:

                    Value
Date        Name      
2014-01-31  n1      5
2014-02-30  n1      6
2014-03-30  n1      7
2014-04-30  n1      8
2014-05-31  n2      9
2014-06-30  n2      3
2014-07-31  n2      4
2014-08-31  n2      5
2014-09-30  n2      6
2014-10-31  n2      7

What I want to do is to prepend the earlier dates from d2, but use the first value from the d1 to populate the value rows of the prepended rows.

The result should look like this:

            Value
Date 
2014-01-31      1
2014-02-30      1
2014-03-30      1         
2014-04-30      1
2014-05-31      2
2014-06-30      3
2014-07-31      4
2014-08-31      5
2014-09-30      6
2014-10-31      7

What the most efficient or easiest way to do this using pandas

Probably not very elegant, but your df2 has MultiIndex ?:

df3 = pd.concat((df1, df2.reset_index().set_index('Date')), axis=1).fillna(method='backfill')
df3.index.name = 'Date'
print df3.set_index([df3.index, df3.Name], drop=True).icol([0])


                 Value
Date       Name       
2014-01-31 n1        1
2014-02-30 n1        1
2014-03-30 n1        1
2014-04-30 n1        1
2014-05-31 n2        2
2014-06-30 n2        3
2014-07-31 n2        4
2014-08-31 n2        5
2014-09-30 n2        6
2014-10-31 n2        7

This is a direct formulation of your problem, and it is quite fast already:

In [126]: def direct(d1, d2):
        dates2 = d2.index.get_level_values('Date')
        dates1 = d1.index
        return d1.reindex(dates2[dates2 < min(dates1)].append(dates1), method='bfill')
   .....: 

In [127]: direct(d1, d2)
Out[127]: 
            Value
Date             
2014-01-31      1
2014-02-28      1
2014-03-30      1
2014-04-30      1
2014-05-31      2
2014-06-30      3
2014-07-31      4
2014-08-31      5
2014-09-30      6
2014-10-31      7

In [128]: %timeit direct(d1, d2)
1000 loops, best of 3: 362 µs per loop

If you are willing to sacrifice some readability for performance, you could compare dates by their internal representation (integers are faster) and do the "backfilling" manually:

In [129]: def fast(d1, d2):
    dates2 = d2.index.get_level_values('Date')    
    dates1 = d1.index
    new_dates = dates2[dates2.asi8 < min(dates1.asi8)]
    new_index = new_dates.append(dates1)
    new_values = np.concatenate((np.repeat(d1.values[:1], len(new_dates), axis=0), d1.values))
    return pd.DataFrame(new_values, index=new_index, columns=d1.columns, copy=False)
   .....: 

In [130]: %timeit fast(d1, d2)
1000 loops, best of 3: 213 µs per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM