As quite a newbie to pandas, I'm struggling with a data arrangement issue.
I've got a huge pile of data from a log file in a pandas dataframe with a structure like this:
day user measure1 measure2 ...
1 u1 xxxxx yyyyy ...
1 u2 xxxxx yyyyy ...
1 u3 xxxxx yyyyy ...
2 u2 xxxxx yyyyy ...
2 u4 xxxxx yyyyy ...
2 u3 xxxxx yyyyy ...
3 u1 xxxxx yyyyy ...
3 u3 xxxxx yyyyy ...
... ... ... ... ...
Hence, not every user appears at each day, while the data is neither sorted by day nor by user. However, if an entry occurs, is has all the measures.
Now I need to rearrange this data to obtain a 2D table "every user" vs. "every day" for each measure and fill the gaps with zeros eg
For measure1: For measure2:
u1 u2 u3 u4 u1 u2 u3 u4
1 xxxxx xxxxx xxxxx 0 1 yyyyy yyyyy yyyyy 0
2 0 xxxxx xxxxx xxxxx 2 0 yyyyy yyyyy yyyyy
3 xxxxx 0 xxxxx 0 3 yyyyy 0 yyyyy 0
How can I do this with pandas in python3? I'm also open to alternative solutions eg using numpy instead of pandas.
So far I managed to extract arrays of all occurring users and days in the dataset but have no clue how to smartly assign the measured data.
I'm grateful for any help on this matter.
It seems like you want a multi-index dataframe (index1: day, index2: measure)
The tricky part is that you might need to transpose your dataframe before these operations. Have a look at the answer of this issue which looks similar to yours Constructing 3D Pandas DataFrame
Hope it helps
You need set_index
and unstack
df.set_index(['day','user']).measure1.unstack(fill_value=0)
Out[6]:
user u1 u2 u3 u4
day
1 xxxxx xxxxx xxxxx 0
2 0 xxxxx xxxxx xxxxx
3 xxxxx 0 xxxxx 0
df.set_index(['day','user']).measure2.unstack(fill_value=0)
Out[7]:
user u1 u2 u3 u4
day
1 yyyyy yyyyy yyyyy 0
2 0 yyyyy yyyyy yyyyy
3 yyyyy 0 yyyyy 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.