[英]Pandas dataframe: convert columns into rows of a single column
I have a dataframe that looks like 我有一个看起来像的数据框
userId feature1 feature2 feature3 ...
123456 0 0.45 0 ...
234567 0 0 0 ...
345678 0.6 0 0.2 ...
.
.
The features are mostly zeros but occasionally some of those would have non-zero values. 特征大部分为零,但偶尔其中一些会具有非零值。 A single row for a userId may have zero, one or more non-zero features.
userId的单个行可能具有零个,一个或多个非零特征。
I want to transform this into the following dataset: 我想将其转换为以下数据集:
userId feature value
123456 feature2 0.45
345678 feature1 0.6
345678 feature3 0.2
Essentially, we retain only the features that are non-zero for each userId. 本质上,我们只保留每个userId非零的功能。 So, for userId 345678, we have 2 rows in the transformed dataset, one for feature1 and the other for feature3.
因此,对于userId 345678,我们在转换后的数据集中有两行,一个行用于feature1,另一行用于feature3。 userId 234567 is dropped since none of the features are non-zero.
由于所有功能均非零,因此删除了userId 234567。
Is this something that can be done using groupby or pivoting? 使用groupby或pivot可以完成此操作吗? If so, how?
如果是这样,怎么办?
Any other pandas-mic solutions? 还有其他熊猫麦克风解决方案吗?
Magic from melt
melt
魔术
df.melt('userId').query('value!=0')
Out[459]:
userId variable value
2 345678 feature1 0.60
3 123456 feature2 0.45
8 345678 feature3 0.20
Notice using stack
you need mask 0 to NaN
注意使用
stack
您需要将掩码0设置为NaN
df.mask(df.eq(0)).set_index('userId').stack().reset_index()
Out[460]:
userId level_1 0
0 123456 feature2 0.45
1 345678 feature1 0.60
2 345678 feature3 0.20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.