简体   繁体   中英

Python pandas dataframe pivot only works with pivot_table() but not with set_index() and unstack()

I am trying to pivot following type of sample data in Pandas dataframe in Python. I came across couple of other stackoverflow answers that discussed how to do the pivot: pivot_table No numeric types to aggregate

However, when I use pivot_table() , I am able to pivot the data. But when I use set_index() and unstack() , I get following error:

AttributeError: 'NoneType' object has no attribute 'unstack'

Sample Data:

id  responseTime    label   answers
ABC 2018-06-24  Category_1  [3]
ABC 2018-06-24  Category_2  [10]
ABC 2018-06-24  Category_3  [10]
DEF 2018-06-25  Category_1  [7]
DEF 2018-06-25  Category_8  [10]
GHI 2018-06-28  Category_3  [7]

Desired Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

This works:

 df=pdDF.pivot_table(index=['items_id','responseTime'], columns='label', values='answers', aggfunc='first') 

This doesn't work:

pdDF.set_index(['items_id','responseTime','label'], append=True, inplace=True).unstack('label')

I also used pdDF[pdDF.isnull().any(axis=1)] to make sure I don't have any NULL data in answers column. I also used append=False but same error happened.

From other threads, it seems set_index() and unstack() are more efficient than pivot_table() . I also don't want to use pivot_table() because it requires aggregation function and my answers column doesn't contain numeric data. I didn't want to use default ( mean() ) so I ended up using first() . Any insights on why one method works and another doesn't?

AttributeError: 'NoneType' object has no attribute 'unstack'

When you use inplace = True in set_index it modified the dataframe in place. It doesn't return anything (None) . So you can't use unstack on None object.

inplace : boolean, default False

Modify the DataFrame in place (do not create a new object)

Use:

df1 = pdDF.set_index(['items_id','responseTime','label']).unstack('label')    
print(df1)

# Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM