熊猫有条件地选择多列

Question

假设我有一个数据框：

C1 V1 C2 V2 Cond
1  2  3  4  X  
5  6  7  8  Y  
9  10 11 12 X

该语句应返回： if Cond == X, pick C1 and C2, else pick C2 and V2 。

输出数据帧类似于：

**编辑：要添加一个更多的要求：列数可以更改，但遵循某些命名模式。 在这种情况下，请选择所有带有“ 1”的列，否则选择“ 2”。 我认为硬编码解决方案可能无法正常工作。

Answer 1

drop Cond以专注于我从中选择的值
reshape numpy数组，以便可以用布尔值区分
使用np.arange(len(df))索引第一维，每行一次
使用df.Cond.ne('X').mul(1)索引第二维。 0等于X
构造最终数据框

pd.DataFrame(
    df.drop('Cond', 1).values.reshape(3, 2, 2)[
        np.arange(len(df)),
        df.Cond.ne('X').mul(1)
    ], df.index, ['C', 'V'])

   C   V
0  1   2
1  7   8
2  9  10

Answer 2

我尝试使用filter和numpy.where创建更通用的解决方案，对于新的列名，请使用extract ：

#if necessary sort columns
df = df.sort_index(axis=1)

#filter df by 1 and 2
df1 = df.filter(like='1')
df2 = df.filter(like='2')
print (df1)
   C1  V1
0   1   2
1   5   6
2   9  10

print (df2)
   C2  V2
0   3   4
1   7   8
2  11  12

#np.where need same shape of mask as df1 and df2
mask = pd.concat([df.Cond == 'X']*len(df1.columns), axis=1)
print (mask)
    Cond   Cond
0   True   True
1  False  False
2   True   True

cols = df1.columns.str.extract('([A-Za-z])', expand=False)
print (cols)
Index(['C', 'V'], dtype='object')

print (np.where(mask, df1,df2))
Index(['C', 'V'], dtype='object')
[[ 1  2]
 [ 7  8]
 [ 9 10]]

print (pd.DataFrame(np.where(mask, df1, df2), index=df.index, columns=cols))
   C   V
0  1   2
1  7   8
2  9  10

Answer 3

如果行的顺序不重要，则可以使用df.loc和df.append 。

ndf1 = df.loc[df['Cond'] == 'X', ['C1','V1']]
ndf2 = df.loc[df['Cond'] == 'Y', ['C2','V2']]
ndf1.columns = ['C','V']
ndf2.columns = ['C','V']

result = ndf1.append(ndf2).reset_index(drop=True)
print(result)
   C   V
0  1   2
1  9  10
2  7   8

Answer 4

DataFrame.where()另一个选项：

df[['C1', 'V1']].where(df.Cond == "X", df[['C2', 'V2']].values)

#  C1   V1
#0  1    2
#1  7    8
#2  9   10

Answer 5

您可以尝试使用类似的方法在这个岗位

首先，定义几个函数：

def cond(row):
    return row['Cond'] == 'X'

def helper(row, col_if, col_ifnot):
    return row[col_if] if cond(row) else row[col_ifnot]

然后，假设您的数据帧称为df ，

df_new = pd.DataFrame(index=df.index)
for col in ['C', 'V']:
    col_1 = col + '1'
    col_2 = col + '2'
    df_new[col] = df.apply(lambda row: helper(row, col_1, col_2), axis=1)

请记住，这种方法对于大型数据帧可能会比较慢，因为apply并没有利用矢量化的优势。 但是，即使是任意的列名，它也应该可以工作（只需将['C', 'V']替换为您的实际列名）。

熊猫有条件地选择多列

问题描述

5 个解决方案

解决方案1
2 2017-01-02 00:56:12

解决方案2
2 已采纳 2017-01-02 08:31:11

解决方案3
1 2017-01-02 01:22:23

解决方案4
1 2017-01-02 01:47:32

解决方案5
0 2017-01-02 01:02:19

熊猫有条件地选择多列

问题描述

5 个解决方案

解决方案1 2 2017-01-02 00:56:12

解决方案2 2 已采纳 2017-01-02 08:31:11

解决方案3 1 2017-01-02 01:22:23

解决方案4 1 2017-01-02 01:47:32

解决方案5 0 2017-01-02 01:02:19

解决方案1
2 2017-01-02 00:56:12

解决方案2
2 已采纳 2017-01-02 08:31:11

解决方案3
1 2017-01-02 01:22:23

解决方案4
1 2017-01-02 01:47:32

解决方案5
0 2017-01-02 01:02:19