简体   繁体   中英

How to create new column with calculation of days between date in 2 other columns in DataFrame in Pandas Python?

I have DataFrame in Python Pandas like below:

col1                | col2
--------------------|--------------------
2020-01-10 19:45:49 | 2020-01-11 00:00:00
2020-01-24 20:14:33 | 2020-01-24 00:00:00
2020-01-24 11:43:15 | 2020-01-20 00:00:00
NaN                 | 2020-08-14 00:00:00
...                 | ...
  • col1 is 'object' data type and could have 'NaN' values
  • col2 is 'object' data type and does not have 'NaN' values

I need to create col3 which will present ow many days have passed since the date in col2 (col1 - col2):

  • if in col1 is NaN in col3 also should be NaN
  • if value in col2 is higher than in col1, value in col3 should be negative
  • if value in col2 is lowe than in col1, value in col3 should be positive
  • if the value in col2 and col1 is the same, value in col3 should be 0

So as a result I need something like below:

col1                | col2                | col3
--------------------|---------------------|----
2020-01-10 19:45:49 | 2020-01-11 00:00:00 | -1
2020-01-24 20:14:33 | 2020-01-24 00:00:00 | 0
2020-01-24 11:43:15 | 2020-01-20 00:00:00 | 4
NaN                 | 2020-08-14 00:00:00 | NaN
...                 | ...                 | ...

How can I do that in Python Pandas?

Try:

# ensure datetime:
df["col1"] = pd.to_datetime(df["col1"])
df["col2"] = pd.to_datetime(df["col2"])

df["col3"] = (df["col1"] - df["col2"]).dt.days

print(df)

Prints:

                 col1       col2  col3
0 2020-01-10 19:45:49 2020-01-11  -1.0
1 2020-01-24 20:14:33 2020-01-24   0.0
2 2020-01-24 11:43:15 2020-01-20   4.0
3                 NaT 2020-08-14   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM