Since all the previous questions I could find about unpivoting a dataframe refered to numeric data, I still haven't found how to proceed with the following.
Let's say I have a Dataframe set up as follows:
+--------+--------+--------+-------+
| Level1 | Level2 | Level3 | Props |
+--------+--------+--------+-------+
| A | A | C | X,Y |
+--------+--------+--------+-------+
| A | B | C | Y,Z |
+--------+--------+--------+-------+
| D | E | F | Y,Z |
+--------+--------+--------+-------+
| G | H | I | X,Z |
+--------+--------+--------+-------+
And I would like to get that:
+--------+--------+--------+---+---+---+
| Level1 | Level2 | Level3 | X | Y | Z |
+--------+--------+--------+---+---+---+
| A | A | C | 1 | 1 | 0 |
+--------+--------+--------+---+---+---+
| A | B | C | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| D | E | F | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| G | H | I | 1 | 0 | 1 |
+--------+--------+--------+---+---+---+
How could I do this?
Thanks!
R.
You could create the dummies with pd.Series.str.get_dummies
and concatenate back to the source dataframe :
pd.concat((df.drop("Props", 1), df.Props.str.get_dummies(",")), axis=1)
Level1 Level2 Level3 X Y Z
0 A A C 1 1 0
1 A B C 0 1 1
2 D E F 0 1 1
3 G H I 1 0 1
As suggested by @BEN_YO, You could use a join :
df.join(df.pop("Props").str.get_dummies(","))
Try this:
import pandas as pd
#reading the csv
df = pd.read_csv('test.csv',delimiter='\t')
#making props column a list containing variables
df['props'] = df['props'].map(lambda x : x.split(','))
#getting dummies
df1 =pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)
#concatenating dummies df with original df and dropping 'props'
new_df = pd.concat([df.drop('props',1),df1],axis=1)
print(new_df)
Or
df['props'] = df['props'].map(lambda x : x.split(','))
new_df = pd.concat([df.drop('props',1),pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)],axis=1)
print(new_df)
Input :
level1 level2 level3 props
A A C X,Y
A B C Y,Z
D D F Y,Z
G G I X,Z
Output :
level1 level2 level3 X Y Z
0 A A C 1 1 0
1 A B C 0 1 1
2 D D F 0 1 1
3 G G I 1 0 1
In [208]: df
Out[208]:
level1 level2 level3 props dummy
0 A A C [X, Y] 1
1 A B C [Y, Z] 1
2 D E F [Y, Z] 1
3 G H I [X, Z] 1
In [209]: df = pd.DataFrame({'level1': list('AADG'), 'level2': list("ABEH"), 'level3': list("CCFI"), 'props':[list("XY"), list("YZ"), list("YZ"), list("XZ")] })
In [210]: df
Out[210]:
level1 level2 level3 props
0 A A C [X, Y]
1 A B C [Y, Z]
2 D E F [Y, Z]
3 G H I [X, Z]
In [211]: df['dummy'] = 1
In [212]: df[['level1', 'level2', 'level3']].join(df.explode('props').pivot(columns='props', values='dummy')).fillna(value=0)
Out[212]:
level1 level2 level3 X Y Z
0 A A C 1.0 1.0 0.0
1 A B C 0.0 1.0 1.0
2 D E F 0.0 1.0 1.0
3 G H I 1.0 0.0 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.