简体   繁体   中英

Unpivot pandas Dataframe with text data

Since all the previous questions I could find about unpivoting a dataframe refered to numeric data, I still haven't found how to proceed with the following.

Let's say I have a Dataframe set up as follows:

+--------+--------+--------+-------+
| Level1 | Level2 | Level3 | Props |
+--------+--------+--------+-------+
| A      | A      | C      | X,Y   |
+--------+--------+--------+-------+
| A      | B      | C      | Y,Z   |
+--------+--------+--------+-------+
| D      | E      | F      | Y,Z   |
+--------+--------+--------+-------+
| G      | H      | I      | X,Z   |
+--------+--------+--------+-------+

And I would like to get that:

+--------+--------+--------+---+---+---+
| Level1 | Level2 | Level3 | X | Y | Z |
+--------+--------+--------+---+---+---+
| A      | A      | C      | 1 | 1 | 0 |
+--------+--------+--------+---+---+---+
| A      | B      | C      | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| D      | E      | F      | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| G      | H      | I      | 1 | 0 | 1 |
+--------+--------+--------+---+---+---+

How could I do this?

Thanks!

R.

You could create the dummies with pd.Series.str.get_dummies and concatenate back to the source dataframe :

pd.concat((df.drop("Props", 1), df.Props.str.get_dummies(",")), axis=1)


 Level1 Level2  Level3  X   Y   Z
0   A      A       C    1   1   0
1   A      B       C    0   1   1
2   D      E       F    0   1   1
3   G      H       I    1   0   1

As suggested by @BEN_YO, You could use a join :

df.join(df.pop("Props").str.get_dummies(","))

Try this:

import pandas as pd
  
#reading the csv
df = pd.read_csv('test.csv',delimiter='\t')

#making props column a list containing variables
df['props'] = df['props'].map(lambda x : x.split(','))

#getting dummies
df1 =pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)

#concatenating dummies df with original df and dropping 'props'
new_df = pd.concat([df.drop('props',1),df1],axis=1)
print(new_df)

Or

df['props'] = df['props'].map(lambda x : x.split(','))
new_df = pd.concat([df.drop('props',1),pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)],axis=1)
print(new_df)

Input :

level1  level2  level3  props
A       A       C       X,Y
A       B       C       Y,Z
D       D       F       Y,Z
G       G       I       X,Z

Output :

  level1 level2 level3  X  Y  Z
0      A      A      C  1  1  0
1      A      B      C  0  1  1
2      D      D      F  0  1  1
3      G      G      I  1  0  1
In [208]: df                                                                                                                                                                                                                                                                     
Out[208]: 
  level1 level2 level3   props  dummy
0      A      A      C  [X, Y]      1
1      A      B      C  [Y, Z]      1
2      D      E      F  [Y, Z]      1
3      G      H      I  [X, Z]      1

In [209]: df = pd.DataFrame({'level1': list('AADG'), 'level2': list("ABEH"), 'level3': list("CCFI"), 'props':[list("XY"), list("YZ"), list("YZ"), list("XZ")] })                                                                                                                 

In [210]: df                                                                                                                                                                                                                                                                     
Out[210]: 
  level1 level2 level3   props
0      A      A      C  [X, Y]
1      A      B      C  [Y, Z]
2      D      E      F  [Y, Z]
3      G      H      I  [X, Z]

In [211]: df['dummy'] = 1                                                                                                                                                                                                                                                        

In [212]: df[['level1', 'level2', 'level3']].join(df.explode('props').pivot(columns='props', values='dummy')).fillna(value=0)                                                                                                                                                    
Out[212]: 
  level1 level2 level3    X    Y    Z
0      A      A      C  1.0  1.0  0.0
1      A      B      C  0.0  1.0  1.0
2      D      E      F  0.0  1.0  1.0
3      G      H      I  1.0  0.0  1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM