简体   繁体   中英

pandas pivot_table to DataFrame

I have data that looks like this

from pandas import DataFrame
data = [{'id': 1, 'label': 0, 'code': 'f1'}, {'id': 1, 'label': 0, 'code': 'f2'},
            {'id': 2, 'label': 1, 'code': 'f3'},
            {'id': 2, 'label': 1, 'code': 'f4'}]
df = DataFrame(data)

>>>
    code  id  label
0   f1   1      0
1   f2   1      0
2   f3   2      1
3   f4   2      1

I want to reshape the data to be something like this (with proper headers and no incorrect id-label associations).

   id label  f1  f2  f3  f4
    1     0   1   1   0   0
    2     1   0   0   1   1

I tried using pivot_table , but with that data looks like this

df['val'] = 1
pt_df = df.pivot_table('val', columns='code', index=['id', 'label'], fill_value=0, dropna=False)

>>>
     f1  f2  f3  f4
1 0   1   1   0   0
  1   0   0   0   0
2 0   0   0   0   0
  1   0   0   1   1

Any suggestions would be helpful! Thanks

I used unstack, which is essentially pivot...

df['vals'] = 1
df = df.set_index(['id' ,'label' ,'code']).unstack('code').fillna(0)
#df = df.reset_index() #to bring out id and label

Here is one way:

>>> df.pivot_table(columns='code', index=['id', 'label'], aggfunc=len, fill_value=0)
code      f1  f2  f3  f4
id label                
1  0       1   1   0   0
2  1       0   0   1   1

[2 rows x 4 columns]

If you want the id/label info in columns instead of the index, just use reset_index .

Your example data set is small, so it's not clear if this will generalize the way you want. Basically what it does is it sets the value for each combination of id/label and code to the number of rows of the DataFrame having that combination (eg, the value for id=1, label=0, code=f1 is 1 because there is one row with those values).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM