Replace values in column based on values in two other columns using pandas

Question

I tried to find a solution for my problem, but came up short. Please let me know if it exists elsewhere.

I have a dataframe with 4 columns, like this:

'A'    'B'    'C'      'D'   

cheese  5     grapes    7  
grapes  7     cheese    8  
steak   1     eggs      21  
eggs    2     steak     1

The entries in 'C' and 'D' must match the values in 'A' and 'B', but not by row; for example, if "cheese" has "5" in 'B', "cheese" cannot have "8" in 'D'. In the case of a mismatch, 'C' and 'D' values must be corrected to a default. In this case, "cheese" should should be corrected so that C:default and D:0. Same with eggs, and grapes. Steak is fine, though.

So the output should look like this:

'A'    'B'  'C'     'D'
cheese  5    grapes  7 
grapes  7    default 0  
steak   1    default 0   
eggs    2    steak   1

I tried to convert 'A' and 'B' to lists with unique values, and then tried to replace 'C' and 'D' values based on the list. I tried all of the conditional df.replace() tricks I could find on stackoverflow, but came up with nothing.

Thank you in advance for any help you provide.

Answer 1

Setup

df = pd.DataFrame({'A': {0: 'cheese', 1: 'grapes', 2: 'steak', 3: 'eggs'},
 'B': {0: 5, 1: 7, 2: 1, 3: 2},
 'C': {0: 'grapes', 1: 'default', 2: 'default', 3: 'steak'},
 'D': {0: 7, 1: 0, 2: 0, 3: 1}})

df
Out[1262]: 
        A  B        C  D
0  cheese  5   grapes  7
1  grapes  7  default  0
2   steak  1  default  0
3    eggs  2    steak  1

Solution

#find rows where df.C should be set to default.
df.C = df.apply(lambda x: x.C if ((x.C not in df.A.tolist()) or (x.D==df.loc[df.A==x.C, 'B'].iloc[0])) else 'default', axis=1)
#set df.D to 0 for df.C == default
df.loc[df.C=='default','D']=0

df
Out[1259]: 
        A  B        C  D
0  cheese  5   grapes  7
1  grapes  7  default  0
2   steak  1  default  0
3    eggs  2    steak  1

Replace values in column based on values in two other columns using pandas

Question

1 answers

solution1
0 ACCPTED 2017-05-13 23:22:37

Replace values in column based on values in two other columns using pandas

Question

1 answers

solution1 0 ACCPTED 2017-05-13 23:22:37

solution1
0 ACCPTED 2017-05-13 23:22:37