I know there are 3 primary parameters in pivot_table. Index, columns and fill_value.
df = pd.pivot_table(df,index='userID',columns='days',fill_value=0) # Fill 0
I can't pivot my dataframe because of the memory problem.
So is it possible to split the index
to small parts then merge those pivot tables together to solve this problem?
For example, userID
was in range(0,1000000), I want to cut them to 3 parts:(0,333333),(333333,666666)and (666666,1000000). Then combine these 3 into one pivot table.
Yes, you can do something like this:
df_out = pd.concat([df.query('UserID < @i').pivot_table(index='UserID',
columns='days', fill_value=0) for i in [333333,666666,1000000]])
By using np.array_split
pd.concat([x.pivot_table(index='UserID',\
columns='days', fill_value=0) for x in np.array_split(df, 3)])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.