在熊猫数据框中堆叠选择列作为行

Question

Suppose I have df_in below:假设我有下面的df_in ：

df_in = pd.DataFrame({'X': ['a', 'b', 'c'], 'A': [1, 0, 0], 'B': [1, 1, 0]})

df_in : df_in :

+---+---+---+---+
|   | X | A | B |
+---+---+---+---+
| 0 | a | 1 | 1 |
| 1 | b | 0 | 1 |
| 2 | c | 0 | 0 |
+---+---+---+---+

I want to achieve something like the following:我想实现以下目标：

df_out = pd.DataFrame({'X': ['a', 'a', 'b'], 'Y': ['A', 'B', 'B']})

df_out : df_out :

+---+---+---+
|   | X | Y |
+---+---+---+
| 0 | a | A |
| 1 | a | B |
| 2 | b | B |
+---+---+---+

I also have a list containing the columns: l = list(['A', 'B']) .我还有一个包含列的列表： l = list(['A', 'B']) 。 The logic is, for each column in df_in that is in l , repeat those observations where the column value == 1 , and add the column name to a new column in df_out , this is Y in the example.逻辑是，对于df_in中l每一列，重复那些列值== 1观察，并将列名添加到df_out的新列，在示例中为Y 。 In reality there are more columns in df_in and not all of them are in l , which is why I want to solve this without explicit references to columns A , B and X .实际上， df_in有更多的列，并不是所有的列都在l ，这就是为什么我想在不显式引用A 、 B和X列的情况下解决这个问题。

NOTE : This is not entirely covered by this answer since, as stated above, there are many columns in reality, and these can be of any type and data, so the solution, df_out , needs to take into account all of the original columns ( X in this case).注意：此答案并未完全涵盖这一点，因为如上所述，现实中有许多列，这些列可以是任何类型和数据，因此解决方案df_out需要考虑所有原始列（ X在这种情况下）。 In theory, X can also be a binary 0/1 column, but should only affect the outcome in the same way as A and B if it's included in l .理论上， X也可以是二进制0/1列，但如果它包含在l ，则应该只以与A和B相同的方式影响结果。 I hope this helps clarify.我希望这有助于澄清。

Answer 1

Use Index.difference for all columns without l pass to DataFrame.set_index , reshape by DataFrame.stack , filter only 1 and last convert MultiIndex.to_frame to new DataFrame with rename last column:对所有列使用Index.difference而不将l传递给DataFrame.set_index ，通过DataFrame.stack重塑，仅过滤1并最后将MultiIndex.to_frame转换为新的DataFrame rename最后一列：

l = ['A', 'B']

c = df_in.columns.difference(l, sort=False).tolist()
s = df_in.set_index(c).stack()
df_out = s[s == 1].index.to_frame(index=False).rename(columns={len(c):'Y'})
print (df_out)
   X  Y
0  a  A
1  a  B
2  b  B

在熊猫数据框中堆叠选择列作为行

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-11 11:08:06

在熊猫数据框中堆叠选择列作为行

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-11 11:08:06

解决方案1
1 已采纳 2020-02-11 11:08:06