Pandas数据框：将列转换为单列的行

Question

I have a dataframe that looks like 我有一个看起来像的数据框

userId  feature1  feature2  feature3  ...
123456  0         0.45      0         ...
234567  0         0         0         ...
345678  0.6       0         0.2       ...
.
.

The features are mostly zeros but occasionally some of those would have non-zero values. 特征大部分为零，但偶尔其中一些会具有非零值。 A single row for a userId may have zero, one or more non-zero features. userId的单个行可能具有零个，一个或多个非零特征。

I want to transform this into the following dataset: 我想将其转换为以下数据集：

userId  feature  value
123456  feature2 0.45
345678  feature1 0.6
345678  feature3 0.2

Essentially, we retain only the features that are non-zero for each userId. 本质上，我们只保留每个userId非零的功能。 So, for userId 345678, we have 2 rows in the transformed dataset, one for feature1 and the other for feature3. 因此，对于userId 345678，我们在转换后的数据集中有两行，一个行用于feature1，另一行用于feature3。 userId 234567 is dropped since none of the features are non-zero. 由于所有功能均非零，因此删除了userId 234567。

Is this something that can be done using groupby or pivoting? 使用groupby或pivot可以完成此操作吗？ If so, how? 如果是这样，怎么办？

Any other pandas-mic solutions? 还有其他熊猫麦克风解决方案吗？

Answer 1

Magic from melt melt魔术

df.melt('userId').query('value!=0')
Out[459]: 
   userId  variable  value
2  345678  feature1   0.60
3  123456  feature2   0.45
8  345678  feature3   0.20

Notice using stack you need mask 0 to NaN 注意使用stack您需要将掩码0设置为NaN

df.mask(df.eq(0)).set_index('userId').stack().reset_index()
Out[460]: 
   userId   level_1     0
0  123456  feature2  0.45
1  345678  feature1  0.60
2  345678  feature3  0.20

Pandas数据框：将列转换为单列的行

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-20 17:38:03

Pandas数据框：将列转换为单列的行

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-20 17:38:03

解决方案1
1 已采纳 2019-02-20 17:38:03