基于值的列的Pandas数据框组合

Question

I have a pandas dataframe with 13 columns - ID(unique identifier),A1,A2,..A12. 我有一个13列的熊猫数据框-ID（唯一标识符），A1，A2，.. A12。 all A columns can have 2 values- 0 or 1 所有A列可以有2个值-0或1

 d = {'ID': ['ID1', 'ID2','ID3', 'ID4'], 'A1': [0,0,0,1], 'A2': [1,0,0,1], 'A3': [0,0,0,0], 'A4': [1,1,0,1], 'A5': [0,0,0,1]
    , 'A6': [0,1,0,0], 'A7': [1,1,0,1], 'A8': [1,0,0,0], 'A9': [1,1,0,1], 'A10': [0,1,0,0], 'A11': [1,1,1,0], 'A12': [1,0,1,1]}
df = pd.DataFrame(data=d)
df

I want to add a new column, A_combined where its value is a combination of the 12 other columns, if their value is 1. For example, if the row is 我想添加一个新列A_combined，如果其值为1，则其值为其他12列的组合。例如，如果该行为

ID1 1 0 0 0 0 1 0 0 1 0 1 0

then A_combined will have the value A1_A6_A9_A11 那么A_combined将具有值A1_A6_A9_A11

Any help would be highly appreciated! 任何帮助将不胜感激！

UPDATE 更新

I am able to achieve a restructring of the dataframe, using @wen 's suggestions: 我可以使用@wen的建议来实现数据帧的重构：

import numpy as np

v=df.iloc[:,:12]

test=v.mul(v).replace(0,np.nan).stack().reset_index()

test

Here 'test' has column names at row level. 在这里，“测试”在行级别具有列名。 Any suggestions on next steps to combine row values by index? 关于下一步按索引组合行值的任何建议？ Thanks! 谢谢！

Answer 1

v=dd.iloc[:,1:]
dd['Acombine']=v.mul(v.columns).replace('',np.nan).stack().groupby(level=0).apply('_'.join)
dd
Out[859]: 
    ID  A1  A2  A3  A12 Acombine
0  ID1   0   0   1    1   A3_A12

Answer 2

Not to sure if I am following your example completely (ie "combination of the 12 other columns, if their value is 1", if what is 1, the first column?). 不知道我是否完全遵循您的示例（即“其他12列的组合，如果它们的值为1”，如果第一列为1，则为1）。

df.loc[df['A1'] == 1, 'A_'] = [df['A1'].astype(str)+df['A2'].astype(str)+df['A3'].astype(str)]

This code reads like so: if column 'A1' is equal to 1, then create and fill column 'A_', with the values from columns A1, A2, and A3. 这段代码看起来像这样：如果列'A1'等于1，则创建并填充列'A_'，并使用列A1，A2和A3中的值。 The far right part of the code could be modified to include all 12 columns. 可以将代码的最右边部分修改为包括所有12列。

Answer 3

I believe the answer below is what you're looking for without having to restructure the data. 我相信下面的答案是您在寻找时无需重构数据的问题。 It uses a temporary dataframe which has the instances of 1 replaced with the colum header you want. 它使用一个临时数据框，该实例的实例1被所需的colum头替换。 Then it creates a new column with the values in each row joined as you want, added back to the original dataframe. 然后，它将创建一个新列，并根据需要将每一行中的值连接在一起，然后将其添加回原始数据框中。

df2 = pd.DataFrame()
for col in df.columns:
    df2[col] = df[col].replace(1, col)

def func(x):
    return '_'.join(str(i) for i in x if i != 0)

df.assign(combined = df2.apply(func, axis=1))

    A1  A10 A11 A12 A2  A3  A4  A5  A6  A7  A8  A9  combined
0   0   0   1   1   1   0   1   0   0   1   1   1   A11_A12_A2_A4_A7_A8_A9
1   0   1   1   0   0   0   1   0   1   1   0   1   A10_A11_A4_A6_A7_A9
2   0   0   1   1   0   0   0   0   0   0   0   0   A11_A12
3   1   0   0   1   1   0   1   1   0   1   0   1   A1_A12_A2_A4_A5_A7_A9

基于值的列的Pandas数据框组合

问题描述

3 个解决方案

解决方案1
1 2018-02-13 19:52:29

解决方案2
1 2018-02-13 19:55:45

解决方案3
0 已采纳 2018-02-13 19:46:05

基于值的列的Pandas数据框组合

问题描述

3 个解决方案

解决方案1 1 2018-02-13 19:52:29

解决方案2 1 2018-02-13 19:55:45

解决方案3 0 已采纳 2018-02-13 19:46:05

解决方案1
1 2018-02-13 19:52:29

解决方案2
1 2018-02-13 19:55:45

解决方案3
0 已采纳 2018-02-13 19:46:05