I tried to find a solution for my problem, but came up short. Please let me know if it exists elsewhere.
I have a dataframe with 4 columns, like this:
'A' 'B' 'C' 'D'
cheese 5 grapes 7
grapes 7 cheese 8
steak 1 eggs 21
eggs 2 steak 1
The entries in 'C' and 'D' must match the values in 'A' and 'B', but not by row; for example, if "cheese" has "5" in 'B', "cheese" cannot have "8" in 'D'. In the case of a mismatch, 'C' and 'D' values must be corrected to a default. In this case, "cheese" should should be corrected so that C:default and D:0. Same with eggs, and grapes. Steak is fine, though.
So the output should look like this:
'A' 'B' 'C' 'D'
cheese 5 grapes 7
grapes 7 default 0
steak 1 default 0
eggs 2 steak 1
I tried to convert 'A' and 'B' to lists with unique values, and then tried to replace 'C' and 'D' values based on the list. I tried all of the conditional df.replace() tricks I could find on stackoverflow, but came up with nothing.
Thank you in advance for any help you provide.
Setup
df = pd.DataFrame({'A': {0: 'cheese', 1: 'grapes', 2: 'steak', 3: 'eggs'},
'B': {0: 5, 1: 7, 2: 1, 3: 2},
'C': {0: 'grapes', 1: 'default', 2: 'default', 3: 'steak'},
'D': {0: 7, 1: 0, 2: 0, 3: 1}})
df
Out[1262]:
A B C D
0 cheese 5 grapes 7
1 grapes 7 default 0
2 steak 1 default 0
3 eggs 2 steak 1
Solution
#find rows where df.C should be set to default.
df.C = df.apply(lambda x: x.C if ((x.C not in df.A.tolist()) or (x.D==df.loc[df.A==x.C, 'B'].iloc[0])) else 'default', axis=1)
#set df.D to 0 for df.C == default
df.loc[df.C=='default','D']=0
df
Out[1259]:
A B C D
0 cheese 5 grapes 7
1 grapes 7 default 0
2 steak 1 default 0
3 eggs 2 steak 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.