I have a pandas data frame like this one:
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
25041 40391 Y E
25041 40391 25081 N W U
25041 40391 42822 99681 1 N Y Y
There are two sets of columns: dx and dxpoa. Depending on certain values in dxpoa, I have to keep values in dx or discard it. Foe each value in dx there is a value in corresponding dxpoa in that row. For ex: If dxpoa = ['Y'or 'W' or '1' or 'E'] then keep dx value in corresponding row otherwise discard it or fill it with 0. Like dxpoa1, in first row, is 'Y' therefore dx1 will remain as it is. But dxpoa1, in second row, is 'N' therefore corresponding value of dx1, of second row, will become 0.
Given a dataframe built like so:
import pandas as pd
import numpy as np
df = pd.DataFrame({'dx1':[25041,25041,25041],
'dx2':[40391,40391,40391],
'dx3':[np.nan,25081,42822],
'dx4':[np.nan,np.nan,99681],
'dxpoa1':['Y','N','1'],
'dxpoa2':['E','W','N'],
'dxpoa3':[np.nan,'U','Y'],
'dxpoa4':[np.nan,np.nan,'Y']})
Which gives:
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 25041 40391 NaN NaN Y E NaN NaN
1 25041 40391 25081 NaN N W U NaN
2 25041 40391 42822 99681 1 N Y Y
Define a function that implements your substitution rules. This is replaces the target column with zero when the value in the reference column is not 'Y', 'W', '1' or 'E', as I understood from your description:
def subfunc(row,col_reference=None,col_target=None):
if not row[col_reference] in ['Y','W','1','E']:
row[col_target] = 0
return row
Then iterate over the column names applying subfunc over each row:
for colname in df.columns:
if 'dxpoa' in colname:
colid = colname.split('dxpoa')[1]
df = df.apply(subfunc,axis=1,col_reference=colname,col_target='dx'+colid)
Results in the dataframe
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 25041 40391 0 0 Y E NaN NaN
1 0 40391 0 0 N W U NaN
2 25041 0 42822 99681 1 N Y Y
Here's a vectorized way of looking at it (using @vmg's handy starting frame):
>>> N = len(df.columns)
>>> keep = df.iloc[:,-N//2:].isin(["Y", "W", "1", "E"]).values
>>> df.iloc[:,:N//2] = df.iloc[:,:N//2].where(keep, 0)
>>> df
dx1 dx2 dx3 dx4 dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 25041 40391 0 0 Y E NaN NaN
1 0 40391 0 0 N W U NaN
2 25041 0 42822 99681 1 N Y Y
What this does is make an array of True and False for the last N//2 columns, with True where the value is in the list and False where it's not (note also that I'm assuming 1 is the string "1"
and not the integer 1
):
>>> df.iloc[:,-N//2:]
dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 Y E NaN NaN
1 N W U NaN
2 1 N Y Y
>>> df.iloc[:,-N//2:].isin(["Y", "W", "1", "E"])
dxpoa1 dxpoa2 dxpoa3 dxpoa4
0 True True False False
1 False True False False
2 True False True True
>>> df.iloc[:,-N//2:].isin(["Y", "W", "1", "E"]).values
array([[ True, True, False, False],
[False, True, False, False],
[ True, False, True, True]], dtype=bool)
Then we can use where
to set the value of the first N//2 columns, keeping the values where keep
is True and otherwise replacing them with 0.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.