[英]How can I convert to a tidy format in python?
My pandas dataframe has separate columns that are one-hot encoded and a total column at the end that sums them up ( total
= val1
+ val2
). 我的pandas数据帧具有单独的一列,这些列经过一键编码,最后有一个total列将它们加起来( total
= val1
+ val2
)。
Some rows have 1s for multiple val columns: 有些行的多个val列的值为1:
| name | val1 | val2 | total |
| joe | 1 | 0 | 1 |
| bob | 0 | 1 | 1 |
| dan | 1 | 1 | 2 |
I want this: 我要这个:
| name | val1 | val2 | total |
| joe | 1 | 0 | 1 |
| bob | 0 | 1 | 1 |
| dan | 1 | 0 | 1 |
| dan | 0 | 1 | 1 |
I can't figure out how to get this to work: to melt it conditional upon the total column. 我不知道如何使它工作:以总列为条件融化它。
The end result should have a total value of 1 for every row. 最终结果每一行的总值为1。
d = df.drop('total', axis=1).set_index('name').stack().loc[lambda x: x == 1]
n, v = zip(*d.index)
pd.concat([pd.Series(n, name='name'), pd.get_dummies(v).assign(total=1)], axis=1)
name val1 val2 total
0 joe 1 0 1
1 bob 0 1 1
2 dan 1 0 1
3 dan 0 1 1
Harder than what I thought 比我想的要难
s1=df.iloc[:,1:-1]
s2=df.iloc[:,0]
df[['name']].join(s1.mul(s2,0).replace('',np.nan).stack().reset_index(level=1)['level_1'].str.get_dummies(),how='right').assign(Total=1)
Out[413]:
name val1 val2 Total
0 joe 1 0 1
1 bob 0 1 1
2 dan 1 0 1
2 dan 0 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.