简体   繁体   中英

Python 3: Transpose columns of Pandas Data Frame / "melt" data frame

I have a Pandas Data Frame like this:

    uid   category   count
0    1     comedy     5
1    1     drama      7
2    2     drama      4
3    3     other      10    
4    3     comedy     6

Except there are dozens of categories, millions of rows, and a few dozen other columns.

I want to turn that into something like this:

    id   cat_comedy   cat_drama    cat_other
0    1    5            7            0
1    2    0            4            0
2    3    6            0            10

I have no idea how to do this and am looking for tips/hints/full solutions. I don't really care about the row index.

Thanks.

I think this is what you're after (the operation is called a 'pivot'):

from pandas import DataFrame

df = DataFrame([
    {'id': 1, 'category': 'comedy', 'count': 5},
    {'id': 1, 'category': 'drama', 'count': 7},
    {'id': 2, 'category': 'drama', 'count': 4},
    {'id': 3, 'category': 'other', 'count': 10},
    {'id': 3, 'category': 'comedy', 'count': 6}
]).set_index('id')

result = df.pivot(columns=['category'])

print(result)

Result:

          count
category comedy drama other
id
1           5.0   7.0   NaN
2           NaN   4.0   NaN
3           6.0   NaN  10.0

In response to your comment, if you don't want the id as an index for the df , you can tell the operation to use it as the index for the pivot. You'll need pivot_table instead of pivot to achieve this, as it allows can handle duplicate values for one pivoted index/column pair.

And replacing the NaN with zeroes is also an option:


df = DataFrame([
    {'uid': 1, 'category': 'comedy', 'count': 5},
    {'uid': 1, 'category': 'drama', 'count': 7},
    {'uid': 2, 'category': 'drama', 'count': 4},
    {'uid': 3, 'category': 'other', 'count': 10},
    {'uid': 3, 'category': 'comedy', 'count': 6}
])

result = df.pivot_table(columns=['category'], index='uid', fill_value=0)

print(result)

However, note that the resulting table still has uid as its index. If that's not what you want, you can revert the resulting columns back to a normal one:

result = df.pivot_table(columns=['category'], index='uid', fill_value=0).reset_index()

The final result:

         uid  count
category     comedy drama other
0          1      5     7     0
1          2      0     4     0
2          3      6     0    10

The original answer from @Grismar (upvoted since he got it in first) is really close but doesn't quite work. Don't reset your index before the pivot call, and then do the following:

df2 = df.pivot_table(columns='category', index='uid', aggfunc=sum)
df2 = df2.fillna(0).reset_index()

df2 is now the dataframe you want. The fillna function replaces all the NaNs with 0s .

Complete solution using pivot_table :

import pandas as pd

df = pd.DataFrame([
    {'uid': 1, 'category': 'comedy', 'count': 5},
    {'uid': 1, 'category': 'drama', 'count': 7},
    {'uid': 2, 'category': 'drama', 'count': 4},
    {'uid': 3, 'category': 'other', 'count': 10},
    {'uid': 3, 'category': 'comedy', 'count': 6}
])

df.pivot_table(
    columns='category', 
    index='uid', 
    aggfunc=sum, 
    fill_value=0
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM