简体   繁体   English

基于值的列的Pandas数据框组合

[英]Pandas Data frame combination of columns based on value

I have a pandas dataframe with 13 columns - ID(unique identifier),A1,A2,..A12. 我有一个13列的熊猫数据框-ID(唯一标识符),A1,A2,.. A12。 all A columns can have 2 values- 0 or 1 所有A列可以有2个值-0或1

 d = {'ID': ['ID1', 'ID2','ID3', 'ID4'], 'A1': [0,0,0,1], 'A2': [1,0,0,1], 'A3': [0,0,0,0], 'A4': [1,1,0,1], 'A5': [0,0,0,1]
    , 'A6': [0,1,0,0], 'A7': [1,1,0,1], 'A8': [1,0,0,0], 'A9': [1,1,0,1], 'A10': [0,1,0,0], 'A11': [1,1,1,0], 'A12': [1,0,1,1]}
df = pd.DataFrame(data=d)
df

I want to add a new column, A_combined where its value is a combination of the 12 other columns, if their value is 1. For example, if the row is 我想添加一个新列A_combined,如果其值为1,则其值为其他12列的组合。例如,如果该行为

ID1 1 0 0 0 0 1 0 0 1 0 1 0 

then A_combined will have the value A1_A6_A9_A11 那么A_combined将具有值A1_A6_A9_A11

Any help would be highly appreciated! 任何帮助将不胜感激!

UPDATE 更新

I am able to achieve a restructring of the dataframe, using @wen 's suggestions: 我可以使用@wen的建议来实现数据帧的重构:

import numpy as np

v=df.iloc[:,:12]

test=v.mul(v).replace(0,np.nan).stack().reset_index()

test

Here 'test' has column names at row level. 在这里,“测试”在行级别具有列名。 Any suggestions on next steps to combine row values by index? 关于下一步按索引组合行值的任何建议? Thanks! 谢谢!

v=dd.iloc[:,1:]
dd['Acombine']=v.mul(v.columns).replace('',np.nan).stack().groupby(level=0).apply('_'.join)
dd
Out[859]: 
    ID  A1  A2  A3  A12 Acombine
0  ID1   0   0   1    1   A3_A12

Not to sure if I am following your example completely (ie "combination of the 12 other columns, if their value is 1", if what is 1, the first column?). 不知道我是否完全遵循您的示例(即“其他12列的组合,如果它们的值为1”,如果第一列为1,则为1)。

df.loc[df['A1'] == 1, 'A_'] = [df['A1'].astype(str)+df['A2'].astype(str)+df['A3'].astype(str)]

This code reads like so: if column 'A1' is equal to 1, then create and fill column 'A_', with the values from columns A1, A2, and A3. 这段代码看起来像这样:如果列'A1'等于1,则创建并填充列'A_',并使用列A1​​,A2和A3中的值。 The far right part of the code could be modified to include all 12 columns. 可以将代码的最右边部分修改为包括所有12列。

I believe the answer below is what you're looking for without having to restructure the data. 我相信下面的答案是您在寻找时无需重构数据的问题。 It uses a temporary dataframe which has the instances of 1 replaced with the colum header you want. 它使用一个临时数据框,该实例的实例1被所需的colum头替换。 Then it creates a new column with the values in each row joined as you want, added back to the original dataframe. 然后,它将创建一个新列,并根据需要将每一行中的值连接在一起,然后将其添加回原始数据框中。

df2 = pd.DataFrame()
for col in df.columns:
    df2[col] = df[col].replace(1, col)

def func(x):
    return '_'.join(str(i) for i in x if i != 0)

df.assign(combined = df2.apply(func, axis=1))

    A1  A10 A11 A12 A2  A3  A4  A5  A6  A7  A8  A9  combined
0   0   0   1   1   1   0   1   0   0   1   1   1   A11_A12_A2_A4_A7_A8_A9
1   0   1   1   0   0   0   1   0   1   1   0   1   A10_A11_A4_A6_A7_A9
2   0   0   1   1   0   0   0   0   0   0   0   0   A11_A12
3   1   0   0   1   1   0   1   1   0   1   0   1   A1_A12_A2_A4_A5_A7_A9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM