简体   繁体   中英

Python Pandas Fill Dataframe with another DataFrame

I have a dataframe

x = pd.DataFrame(index = ['wkdy','hr'],columns=['c1','c2','c3'])

This leads to 168 rows of data in the dataframe. 7 weekdays and 24 hours in each day. I have another dataframe

dates = pd.date_range('20090101',periods = 10000, freq = 'H')
y = DataFrame(np.random.randn(10000, 3), index = dates, columns = ['c1','c2','c3'])
y['hr'] = y.index.hour
y['wkdy'] = y.index.weekday

I want to fill 'y' with data from 'x', so that all each weekday and hour has same data but has a datestamp attached to it.. The only way i know is to loop through the dates and fill values. Is there a faster, more efficient way to do this? My Solution (rather crude to say the least) iterates over the entire dataframe y row by row and tries to fill from dataframe x through a lookup.

for r in range(0,len(y)):
    h = int(y.iloc[r]['hr'])
    w = int(y.iloc[r]['wkdy'])
    y.iloc[r] = x.loc[(w,h)]

Your dataframe x doesn't have 168 rows but looks like

        c1  c2  c3
wkdy    NaN NaN NaN
hr      NaN NaN NaN

and you can't index it using a tuple like in x.loc[(w,h)] . What you probably had in mind was something like

x = pd.DataFrame(
    index=pd.MultiIndex.from_product(
        [range(7), range(24)], names=['wkdy','hr']),
    columns=['c1','c2','c3'],
    data=np.arange(3 * 168).reshape(3, 168).T)
x
              c1   c2   c3
wkdy    hr          
0       0     0    168  336
        1     1    169  337
...     ...   ...  ...  ...
6       22    166  334  502
        23    167  335  503

168 rows × 3 columns

Now your loop will work, although a pythonic representation would look like this:

for idx, row in y.iterrows():
    y.loc[idx, :3] = x.loc[(row.wkdy, row.hr)]

However, iterating through dataframes is very expensive and you should look for a vectorized solution by simply merging the 2 frames and removing the unwanted columns:

y = (x.merge(y.reset_index(), on=['wkdy', 'hr'])
      .set_index('index')
      .sort_index()
      .iloc[:,:-3])
y
                    wkdy    hr   c1_x   c2_x    c3_x
index                   
2009-01-01 00:00:00 3       0    72     240     408
2009-01-01 01:00:00 3       1    73     241     409
...                 ...     ...  ...    ...     ...
2010-02-21 14:00:00 6       14   158    326     494
2010-02-21 15:00:00 6       15   159    327     495

10000 rows × 5 columns

Now y is a dataframe with columns c1_x, c2_x, c3_x having data from dataframe x where y.wkdy==x.wkdy and y.hr==x.hr. Merging here is 1000 times faster than looping.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM