I have data that looks like this
from pandas import DataFrame
data = [{'id': 1, 'label': 0, 'code': 'f1'}, {'id': 1, 'label': 0, 'code': 'f2'},
{'id': 2, 'label': 1, 'code': 'f3'},
{'id': 2, 'label': 1, 'code': 'f4'}]
df = DataFrame(data)
>>>
code id label
0 f1 1 0
1 f2 1 0
2 f3 2 1
3 f4 2 1
I want to reshape the data to be something like this (with proper headers and no incorrect id-label
associations).
id label f1 f2 f3 f4
1 0 1 1 0 0
2 1 0 0 1 1
I tried using pivot_table
, but with that data looks like this
df['val'] = 1
pt_df = df.pivot_table('val', columns='code', index=['id', 'label'], fill_value=0, dropna=False)
>>>
f1 f2 f3 f4
1 0 1 1 0 0
1 0 0 0 0
2 0 0 0 0 0
1 0 0 1 1
Any suggestions would be helpful! Thanks
I used unstack, which is essentially pivot...
df['vals'] = 1
df = df.set_index(['id' ,'label' ,'code']).unstack('code').fillna(0)
#df = df.reset_index() #to bring out id and label
Here is one way:
>>> df.pivot_table(columns='code', index=['id', 'label'], aggfunc=len, fill_value=0)
code f1 f2 f3 f4
id label
1 0 1 1 0 0
2 1 0 0 1 1
[2 rows x 4 columns]
If you want the id/label info in columns instead of the index, just use reset_index
.
Your example data set is small, so it's not clear if this will generalize the way you want. Basically what it does is it sets the value for each combination of id/label and code to the number of rows of the DataFrame having that combination (eg, the value for id=1, label=0, code=f1 is 1 because there is one row with those values).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.