简体   繁体   中英

pivot one column in pandas dataframe and create 4 new columns

I'm working with pandas dataframe. I have datafreme like:

    df
    COUNTRY   LINE    PRODUCT    SERVICE
    Argelia    1       1.0        Mobile
    Argelia    1       2.0        Mobile
    Argelia    1       3.0        Mobile
    Argelia    2       1.0        Mobile
    Argelia    3       2.0        Mobile
    Argelia    3       3.0        Mobile

I want to group by LINE and pivot PRODUCT column, but i need 4 product columns (product_1, product_2, product_3 and product_4), it don't care if there are any PRODUCT value = 4 or not.

I'm trying to use get_dummies with this code:

df = pd.concat([df, pd.get_dummies(dfs['PRODUCT'], prefix='product')], axis=1)
df.drop(['PRODUCT'], axis=1, inplace=True)
df = df.groupby(['COUNTRY', 'LINE', 'SERVICE']).agg({'product_1' : np.max, 'product_2': np.max, 'product_3':np.max, 'product_4':np.max}).reset_index()

But it give me only 3 columns of product, I want 4 columns to have this dataframe:

 COUNTRY    LINE   SERVICE   product_1  product_2  product_3  product_4
 Argelia     1     Mobile       1          1          1           0
 Argelia     2     Mobile       1          0          0           0
 Argelia     3     Mobile       0          1          1           0

Is it possible?

(I need to change PRODUCT values type 1.0 to 1 too)

Use DataFrame.reindex by new columns with all possible products, here is alternative solution, I hope faster with DataFrame.pivot_table , DataFrame.clip for maximal 1 value, rename for convert floats columns to integers, DataFrame.add_prefix and reindex :

cols = [f'product_{i}' for i in range(1, 5)]
df1 = (df.pivot_table(index=['COUNTRY', 'LINE', 'SERVICE'],
                      columns='PRODUCT',
                      fill_value=0,
                      aggfunc='size')
        .clip(upper=1)
        .rename(columns=int)
        .add_prefix('product_')
        .reindex(cols, axis=1, fill_value=0))
print (df1)
PRODUCT               product_1  product_2  product_3  product_4
COUNTRY LINE SERVICE                                            
Argelia 1    Mobile           1          1          1          0
        2    Mobile           1          0          0          0
        3    Mobile           0          1          1          0

In your solution use DataFrame.pop for extract column, convert to integers, then aggregate by max and add reindex :

df = pd.concat([df, pd.get_dummies(df.pop('PRODUCT').astype(int),prefix='product')], axis=1)
cols = [f'product_{i}' for i in range(1, 5)]
df = df.groupby(['COUNTRY', 'LINE', 'SERVICE']).max().reindex(cols, axis=1, fill_value=0)
print (df)
                      product_1  product_2  product_3  product_4
COUNTRY LINE SERVICE                                            
Argelia 1    Mobile           1          1          1          0
        2    Mobile           1          0          0          0
        3    Mobile           0          1          1          0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM