[英]Stacking select columns as rows in pandas dataframe
Suppose I have df_in
below:假设我有下面的
df_in
:
df_in = pd.DataFrame({'X': ['a', 'b', 'c'], 'A': [1, 0, 0], 'B': [1, 1, 0]})
df_in
: df_in
:
+---+---+---+---+
| | X | A | B |
+---+---+---+---+
| 0 | a | 1 | 1 |
| 1 | b | 0 | 1 |
| 2 | c | 0 | 0 |
+---+---+---+---+
I want to achieve something like the following:我想实现以下目标:
df_out = pd.DataFrame({'X': ['a', 'a', 'b'], 'Y': ['A', 'B', 'B']})
df_out
: df_out
:
+---+---+---+
| | X | Y |
+---+---+---+
| 0 | a | A |
| 1 | a | B |
| 2 | b | B |
+---+---+---+
I also have a list containing the columns: l = list(['A', 'B'])
.我还有一个包含列的列表:
l = list(['A', 'B'])
。 The logic is, for each column in df_in
that is in l
, repeat those observations where the column value == 1
, and add the column name to a new column in df_out
, this is Y
in the example.逻辑是,对于
df_in
中l
每一列,重复那些列值== 1
观察,并将列名添加到df_out
的新列,在示例中为Y
。 In reality there are more columns in df_in
and not all of them are in l
, which is why I want to solve this without explicit references to columns A
, B
and X
.实际上,
df_in
有更多的列,并不是所有的列都在l
,这就是为什么我想在不显式引用A
、 B
和X
列的情况下解决这个问题。
NOTE : This is not entirely covered by this answer since, as stated above, there are many columns in reality, and these can be of any type and data, so the solution, df_out
, needs to take into account all of the original columns ( X
in this case).注意:此答案并未完全涵盖这一点,因为如上所述,现实中有许多列,这些列可以是任何类型和数据,因此解决方案
df_out
需要考虑所有原始列( X
在这种情况下)。 In theory, X
can also be a binary 0/1
column, but should only affect the outcome in the same way as A
and B
if it's included in l
.理论上,
X
也可以是二进制0/1
列,但如果它包含在l
,则应该只以与A
和B
相同的方式影响结果。 I hope this helps clarify.我希望这有助于澄清。
Use Index.difference
for all columns without l
pass to DataFrame.set_index
, reshape by DataFrame.stack
, filter only 1
and last convert MultiIndex.to_frame
to new DataFrame
with rename
last column:对所有列使用
Index.difference
而不将l
传递给DataFrame.set_index
,通过DataFrame.stack
重塑,仅过滤1
并最后将MultiIndex.to_frame
转换为新的DataFrame
rename
最后一列:
l = ['A', 'B']
c = df_in.columns.difference(l, sort=False).tolist()
s = df_in.set_index(c).stack()
df_out = s[s == 1].index.to_frame(index=False).rename(columns={len(c):'Y'})
print (df_out)
X Y
0 a A
1 a B
2 b B
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.