Below is my data-frame.
I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.
How to loop through each row and perform this kind of subtraction?
Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:
>>> df
a b c d
0 0.414762 0.113796 0.134529 NaN
1 NaN 0.662192 0.703417 NaN
2 0.958970 NaN 0.237540 NaN
3 0.975512 0.241572 NaN 0.720148
4 0.719265 0.735744 0.801279 NaN
and make some intermediate columns that have the value of df['a']
when it is not NaN
, and otherwise fill with the value of df['b']
. You can do this with df.fillna()
pretty easily; you can use it to fill the NaN
values with values from another column. Then you can just take the difference of those two columns. For eg:
>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 NaN 0.414762 0.134529 0.280233
1 NaN 0.662192 0.703417 NaN 0.662192 0.703417 -0.041225
2 0.958970 NaN 0.237540 NaN 0.958970 0.237540 0.721430
3 0.975512 0.241572 NaN 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 NaN 0.719265 0.801279 -0.082013
This is assuming the missing values are NaN
but yours are N/A
. You can also use df.replace()
in the same way to replace the value of strings:
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
a b c d a_or_b c_or_d e
0 0.414762 0.113796 0.134529 N/A 0.414762 0.134529 0.280233
1 N/A 0.662192 0.703417 N/A 0.662192 0.703417 -0.041225
2 0.95897 N/A 0.23754 N/A 0.958970 0.237540 0.721430
3 0.975512 0.241572 N/A 0.720148 0.975512 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A 0.719265 0.801279 -0.082013
Although I do recommend not using strings but actual null-type values when you're working with them, like NaN
( np.nan
) or None
instead of a string like N/A
.
Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.
>>> df
a b c d
0 0.414762 0.113796 0.134529 N/A
1 N/A 0.662192 0.703417 N/A
2 0.95897 N/A 0.23754 N/A
3 0.975512 0.241572 N/A 0.720148
4 0.719265 0.735744 0.801279 N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
a b c d e
0 0.414762 0.113796 0.134529 N/A 0.280233
1 N/A 0.662192 0.703417 N/A -0.041225
2 0.95897 N/A 0.23754 N/A 0.721430
3 0.975512 0.241572 N/A 0.720148 0.255364
4 0.719265 0.735744 0.801279 N/A -0.082013
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.