简体   繁体   English

Python Pandas 根据多列条件替换值

[英]Python Pandas replace value based on multiple column conditions

I have a dataframe我有一个数据框

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}
df_in = pd.DataFrame(data)

print(df_in)

    A   B   C
0  A1      C1
1      B2  C2
2      B3    
3  A4        
4          C5

I'm trying to replace A or B column if C column is not empty and A or B are not empty.如果 C 列不为空且 A 或 B 不为空,我正在尝试替换 A 或 B 列。 After replacing, I need to clear value in C column.替换后,我需要清除 C 列中的值。

I expect this output我希望这个输出

    A   B   C
0   C1      
1       C2  
2       B3  
3   A4      
4           C5

I tried several things, the closest is我尝试了几件事,最接近的是

df_in['A'] = np.where(
   (df_in['A'] !='') & (df_in['C'] != '') , df_in['A'], df_in['C']
   )

df_in['B'] = np.where(
   (df_in['B'] !='') & (df_in['C'] != '') , df_in['B'], df_in['C']
   )

But this clear also the other value and I l'm loosing A4 and B3 and I don't clear C1 and C2但这也清除了另一个值,我正在失去 A4 和 B3 而我没有清除 C1 和 C2

What I got我得到了什么

    A   B   C
0   C1      C1
1       C2  C2
2           
3           
4           C5

Thank you谢谢

You are very close, but you have the arguments switched in np.where , the syntax is np.where(cond, if_cond_True, if_cond_False) .您非常接近,但是您在np.where切换了参数,语法为np.where(cond, if_cond_True, if_cond_False) The columns A and B should have the value of column if the condition is satisfied ( if_cond_True ), otherwise they keep their original values ( if_cond_False ).如果条件满足( if_cond_True ),列 A 和 B 应该具有 column 的值,否则它们保留其原始值( if_cond_False )。

import pandas as pd
import numpy as np 

data_in = {'A':['A1', '', '', 'A4',''],
        'B':['', 'B2', 'B3', '',''],
        'C':['C1','C2','','','C5']}

df_in = pd.DataFrame(data_in)

maskA = df_in['A'] != ''   # A not empty
maskB = df_in['B'] != ''   # B not empty
maskC = df_in['C'] != ''   # C not empty

# If A and C are not empty, A = C, else A keep its value 
df_in['A'] = np.where(maskA & maskC, df_in['C'], df_in['A'])

# If B and C are not empty, B = C, else B keep its value
df_in['B'] = np.where(maskB & maskC, df_in['C'], df_in['B'])

# If (A and C are not empty) or (B and C are not empty),
# C should be empty, else C keep its value
df_in['C'] = np.where((maskA & maskC) | (maskB & maskC), "", df_in['C'])

Output输出

>>> df_in 

    A   B   C
0  C1        
1      C2    
2      B3    
3  A4        
4          C5

I'm not sure if there is an issue setting a columns value that is also in the where condition off hand but you could always create a temp column and rename/drop other outputs based on that.我不确定是否存在设置列值的问题,该列值也在 where 条件下,但您始终可以创建一个临时列并基于此重命名/删除其他输出。

An alternative is to use the apply function.另一种方法是使用 apply 函数。

def update_data(row):
    a = row['A']
    b = row['B']
    c = row['C']

    if not c.isna():
        if a.isna():
            row['A'] = c

        if b.isna():
            row['B'] = c

    return row

df_new = df.apply(update_data, axis=1) df_new = df.apply(update_data,axis=1)

Apply will definitely get you the correct result, however, I'm not certain as to what your desired outcome is so you may need to adjust the logic. Apply 肯定会为您提供正确的结果,但是,我不确定您想要的结果是什么,因此您可能需要调整逻辑。 The above will set columns A and/or B = C if A is a na type object ("" is a na type) and C is not a na type object.如果 A 是 na 类型对象("" 是 na 类型)并且 C 不是 na 类型对象,则上面将设置列 A 和/或 B = C。 Otherwise it will not update anything.否则它不会更新任何东西。

I'm not sure what you want by "clear column C".我不确定“清除 C 列”您想要什么。 You can just drop the column if that's what you want.如果这是您想要的,您可以删除该列。 If you want to change the value you can do so in the update_data function or do a string replace.如果您想更改值,您可以在 update_data 函数中执行此操作或执行字符串替换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM