I have a dataframe
x = pd.DataFrame(index = ['wkdy','hr'],columns=['c1','c2','c3'])
This leads to 168 rows of data in the dataframe. 7 weekdays and 24 hours in each day. I have another dataframe
dates = pd.date_range('20090101',periods = 10000, freq = 'H')
y = DataFrame(np.random.randn(10000, 3), index = dates, columns = ['c1','c2','c3'])
y['hr'] = y.index.hour
y['wkdy'] = y.index.weekday
I want to fill 'y' with data from 'x', so that all each weekday and hour has same data but has a datestamp attached to it.. The only way i know is to loop through the dates and fill values. Is there a faster, more efficient way to do this? My Solution (rather crude to say the least) iterates over the entire dataframe y row by row and tries to fill from dataframe x through a lookup.
for r in range(0,len(y)):
h = int(y.iloc[r]['hr'])
w = int(y.iloc[r]['wkdy'])
y.iloc[r] = x.loc[(w,h)]
Your dataframe x doesn't have 168 rows but looks like
c1 c2 c3
wkdy NaN NaN NaN
hr NaN NaN NaN
and you can't index it using a tuple like in x.loc[(w,h)]
. What you probably had in mind was something like
x = pd.DataFrame(
index=pd.MultiIndex.from_product(
[range(7), range(24)], names=['wkdy','hr']),
columns=['c1','c2','c3'],
data=np.arange(3 * 168).reshape(3, 168).T)
x
c1 c2 c3
wkdy hr
0 0 0 168 336
1 1 169 337
... ... ... ... ...
6 22 166 334 502
23 167 335 503
168 rows × 3 columns
Now your loop will work, although a pythonic representation would look like this:
for idx, row in y.iterrows():
y.loc[idx, :3] = x.loc[(row.wkdy, row.hr)]
However, iterating through dataframes is very expensive and you should look for a vectorized solution by simply merging the 2 frames and removing the unwanted columns:
y = (x.merge(y.reset_index(), on=['wkdy', 'hr'])
.set_index('index')
.sort_index()
.iloc[:,:-3])
y
wkdy hr c1_x c2_x c3_x
index
2009-01-01 00:00:00 3 0 72 240 408
2009-01-01 01:00:00 3 1 73 241 409
... ... ... ... ... ...
2010-02-21 14:00:00 6 14 158 326 494
2010-02-21 15:00:00 6 15 159 327 495
10000 rows × 5 columns
Now y is a dataframe with columns c1_x, c2_x, c3_x having data from dataframe x where y.wkdy==x.wkdy and y.hr==x.hr. Merging here is 1000 times faster than looping.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.