简体   繁体   中英

convert days difference to numeric values in python pandas

I have a column 'datedif' in my dataframe as :

  exposuredate min_exposure_date    datedif
  2014-10-08   2014-09-27           11 days
  2014-10-09   2014-09-27           12 days
  2014-09-27   2014-09-27           0 days
  2014-09-27   2014-09-27           0 days
  2014-10-22   2014-09-27           25 days

  data.exposuredate = pd.to_datetime(data.exposuredate)
  data.min_exposure_date = pd.to_datetime(data.min_exposure_date)

  data['datedif'] = ((data.exposuredate)-(data.min_exposure_date))

The format for the columns are datetime64[ns]. I want to extract the number of days in the field 'datedif'. I am not able to find anything which could help me to extract the difference in terms of number of days.

I tried:

data['datedif_day'] = data['datedif'].dt.days

Error:

AttributeError: 'Series' object has no attribute 'dt'

The Pandas Docs relate to the types of conversions you are looking for as Frequency Conversion

The two options are 1) division by Timedelta or 2) type conversion. There is a subtle difference between the two as stated in the docs:

"Note that division by the numpy scalar is true division, while astyping is equivalent of floor division."

data = pd.DataFrame([("2014-10-08", "2014-09-27"),
                     ("2014-10-09", "2014-09-27"),
                     ("2014-09-27", "2014-09-27"),
                     ("2014-10-22", "2014-09-27")],
                    columns=["exposuredate", "min_exposure_date"])

data['datediff'] =   pd.to_datetime(data.exposuredate) 
                   - pd.to_datetime(data.min_exposure_date)

data['datediff'] / pd.Timedelta(1, unit='d')
data['datediff'].astype('timedelta64[D]')

Both operations yield:

0    11.0
1    12.0
2     0.0
3    25.0
Name: datediff, dtype: float64

If you are using the date difference as a feature for training a machine learning algorithm, it doesn't matter in which form they are represented as they should be normalised anyway. timedelta64[ns] is a perfectly fine for that. When it comes to visualisation purposes, see this post .

The 'datedif' looks in days format but actually it is in seconds. So in order to get number of days for furhter use add the following line in the code:

   data['datedif'] = data['datedif'].astype(np.numpy64)
   data['datedif_day'] = (data['datedif']/86400000000000)

Came across this same question today and I think the following solutin is the easiest:

Setup:

df = pd.DataFrame([("2014-10-08", "2014-09-27"),
                     ("2014-10-09", "2014-09-27"),
                     ("2014-09-27", "2014-09-27"),
                     ("2014-10-22", "2014-09-27")],
                    columns=["exposuredate", "min_exposure_date"])

df['datediff'] =   pd.to_datetime(df.exposuredate) - pd.to_datetime(df.min_exposure_date)

    exposuredate    min_exposure_date   datediff
0   2014-10-08      2014-09-27          11 days
1   2014-10-09      2014-09-27          12 days
2   2014-09-27      2014-09-27          0 days
3   2014-10-22      2014-09-27          25 days

Solution:

df.datediff.apply(lambda x: x.days)

0    11
1    12
2     0
3    25
Name: datediff, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM