[英]Unpivot pandas Dataframe with text data
由于我可以找到有关取消数据框引用数字数据的所有先前问题,因此我仍然没有找到如何进行以下操作。
假设我有一个 Dataframe 设置如下:
+--------+--------+--------+-------+
| Level1 | Level2 | Level3 | Props |
+--------+--------+--------+-------+
| A | A | C | X,Y |
+--------+--------+--------+-------+
| A | B | C | Y,Z |
+--------+--------+--------+-------+
| D | E | F | Y,Z |
+--------+--------+--------+-------+
| G | H | I | X,Z |
+--------+--------+--------+-------+
我想得到:
+--------+--------+--------+---+---+---+
| Level1 | Level2 | Level3 | X | Y | Z |
+--------+--------+--------+---+---+---+
| A | A | C | 1 | 1 | 0 |
+--------+--------+--------+---+---+---+
| A | B | C | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| D | E | F | 0 | 1 | 1 |
+--------+--------+--------+---+---+---+
| G | H | I | 1 | 0 | 1 |
+--------+--------+--------+---+---+---+
我怎么能这样做?
谢谢!
R。
您可以使用pd.Series.str.get_dummies
创建虚拟pd.Series.str.get_dummies
并连接回源数据帧:
pd.concat((df.drop("Props", 1), df.Props.str.get_dummies(",")), axis=1)
Level1 Level2 Level3 X Y Z
0 A A C 1 1 0
1 A B C 0 1 1
2 D E F 0 1 1
3 G H I 1 0 1
正如@BEN_YO 所建议的,您可以使用 join :
df.join(df.pop("Props").str.get_dummies(","))
尝试这个:
import pandas as pd
#reading the csv
df = pd.read_csv('test.csv',delimiter='\t')
#making props column a list containing variables
df['props'] = df['props'].map(lambda x : x.split(','))
#getting dummies
df1 =pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)
#concatenating dummies df with original df and dropping 'props'
new_df = pd.concat([df.drop('props',1),df1],axis=1)
print(new_df)
或者
df['props'] = df['props'].map(lambda x : x.split(','))
new_df = pd.concat([df.drop('props',1),pd.get_dummies(df.props.apply(pd.Series).stack()).sum(level=0)],axis=1)
print(new_df)
输入:
level1 level2 level3 props
A A C X,Y
A B C Y,Z
D D F Y,Z
G G I X,Z
输出:
level1 level2 level3 X Y Z
0 A A C 1 1 0
1 A B C 0 1 1
2 D D F 0 1 1
3 G G I 1 0 1
In [208]: df
Out[208]:
level1 level2 level3 props dummy
0 A A C [X, Y] 1
1 A B C [Y, Z] 1
2 D E F [Y, Z] 1
3 G H I [X, Z] 1
In [209]: df = pd.DataFrame({'level1': list('AADG'), 'level2': list("ABEH"), 'level3': list("CCFI"), 'props':[list("XY"), list("YZ"), list("YZ"), list("XZ")] })
In [210]: df
Out[210]:
level1 level2 level3 props
0 A A C [X, Y]
1 A B C [Y, Z]
2 D E F [Y, Z]
3 G H I [X, Z]
In [211]: df['dummy'] = 1
In [212]: df[['level1', 'level2', 'level3']].join(df.explode('props').pivot(columns='props', values='dummy')).fillna(value=0)
Out[212]:
level1 level2 level3 X Y Z
0 A A C 1.0 1.0 0.0
1 A B C 0.0 1.0 1.0
2 D E F 0.0 1.0 1.0
3 G H I 1.0 0.0 1.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.