Subtract two dates from different columns based on data availability

Question

Below is my data-frame.

I need to subtract dates c/d from a/b based on date availability if 'a' is NA I need to select the value from 'b' and same goes for c and d. If 'c' is NA I need to select the value from 'd'. I need a column 'e' containing the difference.

How to loop through each row and perform this kind of subtraction?

Answer 1

Following the logic in my comment, the easiest thing to do with Pandas most of the time is to create intermediate columns. Eventually you can remove them or optimize them away if you don't want them. But it is an easy way to encapsulate your logic. What you want to do is take a dataframe like this:

>>> df
          a         b         c         d
0  0.414762  0.113796  0.134529       NaN
1       NaN  0.662192  0.703417       NaN
2  0.958970       NaN  0.237540       NaN
3  0.975512  0.241572       NaN  0.720148
4  0.719265  0.735744  0.801279       NaN

and make some intermediate columns that have the value of df['a'] when it is not NaN , and otherwise fill with the value of df['b'] . You can do this with df.fillna() pretty easily; you can use it to fill the NaN values with values from another column. Then you can just take the difference of those two columns. For eg:

>>> df['a_or_b'] = df['a'].fillna(df['b'])
>>> df['c_or_d'] = df['c'].fillna(df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
          a         b         c         d    a_or_b    c_or_d         e
0  0.414762  0.113796  0.134529       NaN  0.414762  0.134529  0.280233
1       NaN  0.662192  0.703417       NaN  0.662192  0.703417 -0.041225
2  0.958970       NaN  0.237540       NaN  0.958970  0.237540  0.721430
3  0.975512  0.241572       NaN  0.720148  0.975512  0.720148  0.255364
4  0.719265  0.735744  0.801279       NaN  0.719265  0.801279 -0.082013

This is assuming the missing values are NaN but yours are N/A . You can also use df.replace() in the same way to replace the value of strings:

>>> df
          a         b         c         d
0  0.414762  0.113796  0.134529       N/A
1       N/A  0.662192  0.703417       N/A
2   0.95897       N/A   0.23754       N/A
3  0.975512  0.241572       N/A  0.720148
4  0.719265  0.735744  0.801279       N/A
>>> df['a_or_b'] = df['a'].replace('N/A', df['b'])
>>> df['c_or_d'] = df['c'].replace('N/A', df['d'])
>>> df['e'] = df['a_or_b'] - df['c_or_d']
>>> df
          a         b         c         d    a_or_b    c_or_d         e
0  0.414762  0.113796  0.134529       N/A  0.414762  0.134529  0.280233
1       N/A  0.662192  0.703417       N/A  0.662192  0.703417 -0.041225
2   0.95897       N/A   0.23754       N/A  0.958970  0.237540  0.721430
3  0.975512  0.241572       N/A  0.720148  0.975512  0.720148  0.255364
4  0.719265  0.735744  0.801279       N/A  0.719265  0.801279 -0.082013

Although I do recommend not using strings but actual null-type values when you're working with them, like NaN ( np.nan ) or None instead of a string like N/A .

Either way, now you know what the intermediate columns are---so you can just directly use those results instead of actually assigning them to the dataframe if you don't want to.

>>> df
          a         b         c         d
0  0.414762  0.113796  0.134529       N/A
1       N/A  0.662192  0.703417       N/A
2   0.95897       N/A   0.23754       N/A
3  0.975512  0.241572       N/A  0.720148
4  0.719265  0.735744  0.801279       N/A
>>> df['e'] = df['a'].replace('N/A', df['b']) - df['c'].replace('N/A', df['d'])
>>> df
          a         b         c         d         e
0  0.414762  0.113796  0.134529       N/A  0.280233
1       N/A  0.662192  0.703417       N/A -0.041225
2   0.95897       N/A   0.23754       N/A  0.721430
3  0.975512  0.241572       N/A  0.720148  0.255364
4  0.719265  0.735744  0.801279       N/A -0.082013

Subtract two dates from different columns based on data availability

Question

1 answers

solution1
0 ACCPTED 2018-11-19 01:26:21

Subtract two dates from different columns based on data availability

Question

1 answers

solution1 0 ACCPTED 2018-11-19 01:26:21

solution1
0 ACCPTED 2018-11-19 01:26:21