简体   繁体   中英

Python Dateutil: Unable to calculate age from two dates (relativedelta)

I'm trying to create a new column in a dataframe, calculating a persons age using Dateutil's relativedelta function, using the following code;

df['Age'] = relativedelta(df['Today'], df['DOB']).years

However, I'm getting the following errors;

ValueError                                Traceback (most recent call last)
<ipython-input-99-f87ca88a2e3c> in <module>()
      1 
----> 2 df['Years of Age2'] = relativedelta(df['Today'], df['DOB']).years

C:\anaconda3\lib\site-packages\dateutil\relativedelta.py in __init__(self, dt1, dt2, years, months, days, leapdays, weeks, hours, minutes, seconds, microseconds, year, month, day, weekday, yearday, nlyearday, hour, minute, second, microsecond)
    101                              "ambiguous and not currently supported.")
    102 
--> 103         if dt1 and dt2:
    104             # datetime is a subclass of date. So both must be date
    105             if not (isinstance(dt1, datetime.date) and

C:\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
    953         raise ValueError("The truth value of a {0} is ambiguous. "
    954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955                          .format(self.__class__.__name__))
    956 
    957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().



It is successful outside a dataframe as below;

DOB = datetime.date(1990,8,25)
Today = datetime.date.today()

relativedelta(Today, DOB).years

Out[2]: 29

=====================================================================

So i assume I am doing something wrong with passing in the datatypes to the function from the Dataframe?

I can calculate age a different way with the below code, I just don't understand why the first method isn't working.

df['Years of Age'] = np.round((df['Today'] - df['DOB'])/np.timedelta64(1,'Y'),decimals = 0)

Here is the starter code;


import pandas as pd
import numpy as np
import datetime
from dateutil.relativedelta import relativedelta 


ind = 'Andy Brandy Cindy'

MyDict = {"DOB" : [ (datetime.date(1954,7,5)),
                    (datetime.date(1998,1,27)),
                    (datetime.date(2001,3,15)) ]}

df = pd.DataFrame(data=MyDict,index=ind.split())

df['Today'] = datetime.date.today()

df


        DOB         Today
Andy    1954-07-05  2019-08-30
Brandy  1998-01-27  2019-08-30
Cindy   2001-03-15  2019-08-30

Here is the calculation;

df['Age'] = relativedelta(df['Today'], df['DOB']).years

I don't think relativedelta can accept pandas Series as parameters. The traceback show that the problem is when you the code behind relativedelta tries to check the instance of the first parameter dt1 passed to relativedelta , in your code being the Series df['Today'] . Then is raised the value error from pandas saying that it is ambiguous to check if a Series is of instance datetime.datetime with isinstance . As you did yourself, outside of a dataframe, it works because you pass directly datetime objects and not Series. So you could use apply to calculate row-wise the difference between 2 datetime object

df['Age'] = df.apply(lambda x: relativedelta(x['Today'], x['DOB']).years, axis=1)

but I think the workaround you found is faster, while maybe not as precise as using relativedelta

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM