I have a DataFrame of the following type:
df = pd.DataFrame({'price':[1,2,3,2,2,3,1,2,1], 'quantity':[10,20,30,10,20,30,20,20,10]})
df
Out[10]:
price quantity
0 1 10
0 2 20
1 3 30
1 2 10
1 4 20
2 3 30
3 1 20
4 2 20
4 1 10
and I want to create a second DateFrame that looks like:
df_bucket = pd.DataFrame(columns=np.arange(0, 5, 1), index=df.index)
0 1 2 3 4
0 NaN 10 20 NaN NaN
1 NaN NaN 10 30 20
2 NaN NaN NaN 30 NaN
3 20 NaN NaN NaN NaN
4 10 20 NaN NaN NaN
I tried the following, but it is extremely slow and yields just a bunch of nans
df_bucket.loc[df.index][df['price']] = df['quantity']
df_bucket
Out[12]:
0 1 2 3 4
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
I know I can do this in a for loop, but I am sure it's going to take ages, do you know of any faster way to accomplish this?
To give some context, these are order book data indexed by mostly unique timestamps. I would like to reorganise the DataFrame with a column per price level and preserve the index, effectively a very inefficient, but convenient, way of organising the data. The DataFrame has few hundred thousands rows and this is why I need a more efficient way than looping over the rows.
df = pd.DataFrame({'ind':[0,0,1,1,1,2,3,4,4],
'price':[1,2,3,2,2,3,1,2,1],
'quantity':[10,20,30,10,20,30,20,20,10]})
df.pivot_table(index=df.ind,columns='price',values='quantity')
df
price 1 2 3
ind
0 10.0 20.0 NaN
1 NaN 15.0 30.0
2 NaN NaN 30.0
3 20.0 NaN NaN
4 10.0 20.0 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.