简体   繁体   English

如何基于python pandas中的日期计算年龄? 数据类型错误

[英]How to calculate age based on date in python pandas? Data types errors

I have a *.csv file with the following format: 我有一个* .csv文件,格式如下:

ID Date (YYYY-MM-DD)

I need to calculate the age of each person, but I can´t find a way. 我需要计算每个人的年龄,但是我找不到办法。 I tried to read the column as date using 我试图使用读取日期列

 parse_dates=['date']

but it didn´t work. 但这没用。

Then, I tried to add a column with the actual date and substracting both but I got a column type error and I wasn´t able con parse both to numeric. 然后,我尝试添加带有实际日期的列并减去两者,但出现列类型错误,并且无法将两者都解析为数字。 I tried pd.to_numeric(...,errors='coerce') but when I apply the dtype function it doesn´t return a numeric. 我尝试了pd.to_numeric(...,errors='coerce')但是当我应用dtype函数时,它不返回数字。

I´m frustrated as I´m starting with Pandas and it´sa very easy task to do in the softwares I´m used to but I can´t figure out how to do here. 当我开始使用Pandas时,我感到很沮丧,这对我以前使用的软件来说是一件很容易的事,但我不知道该怎么做。 Any help would be really appreciated. 任何帮助将非常感激。

Unless you post your csv or the code to create the dataframe, it would be difficult to answer. 除非发布您的csv或创建数据框的代码,否则将很难回答。 You may look at the link for a possible approach to your date-difference issue. 您可能会在链接中找到解决日期差异问题的可能方法。

df
        A          B
one  2014-01-01  2014-02-28 
two  2014-02-03  2014-03-01

Assuming these were datetime columns (if they're not apply to_datetime ) you can just subtract them: 假设这些是datetime列(如果它们不适用于to_datetime ),则可以减去它们:

df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])

In [11]: df.dtypes  # if already datetime64 you don't need to use to_datetime
Out[11]:
A    datetime64[ns]
B    datetime64[ns]
dtype: object

In [12]: df['A'] - df['B']
Out[12]:
one   -58 days
two   -26 days
dtype: timedelta64[ns]

In [13]: df['C'] = df['A'] - df['B']

In [14]: df
Out[14]:
             A          B        C
one     2014-01-01   2014-02-28 -58 days
two     2014-02-03   2014-03-01 -26 days

if you apply the option parse, you obtain a Timestamp() object. 如果应用选项解析,则将获得一个Timestamp()对象。 One posible option is transform your date to str and operate in this format, creating a new column in your pandas dataframe, for example: 一种可能的选择是将日期转换为str并以这种格式进行操作,在pandas数据框中创建一个新列,例如:

>>> for i in df['you_datetime_column'].iteritems():
...:    date_ref = i[1]
...:    # your operation
...:    df['edad']=result
>>> date_ref
Timestamp('2017-01-09 11:42:05')  # date of the last row
>>> date_ref = stt(date_ref)
'2017-01-09 11:42:05'
>>> date_ref=str(date_ref.split([0]))
>>> date_ref
'2017-01-09'

Let's move on to calculate the age... 让我们继续计算年龄...

>>> from datetime import date
>>> def diferencia(date1,date2):
        d1 = date(int(date1[0:4]),int(date1[5:7]),int(date1[-2:]))
        d2 = date(int(date2[0:4]),int(date2[5:7]),int(date2[-2:]))
        dif = d2 - d1
        return str(round(dif.days/365.0,1))+' years'
>>> from datetime import datetime
>>> now=datetime.now().date()
>>> now
datetime.date(2018, 4, 9)
>>> now = str(now)
>>> now
'2018-04-09'
>>> diferencia(time_ref,now)
'1.2 years'
>>> diff = float(diferencia(k,l).split()[0])
>>> diff
1.2
>>> type(float)
float

Here is a step-by-step example. 这是一个分步示例。

You haven't provided your logic. 您尚未提供逻辑。 For us to help debug your problem, you should show us both your data and your code . 为了帮助我们调试您的问题,您应该向我们展示您的数据和代码

import pandas as pd
from io import StringIO

mystr = StringIO("""ID  Date
1 2000-02-03
2 1990-06-30
3 1995-05-12
4 1985-12-31
""")

# replace mystr with 'file.csv'
df = pd.read_csv(mystr, delim_whitespace=True, parse_dates=['Date'])

print(df.dtypes)

# ID               int64
# Date    datetime64[ns]
# dtype: object

df['Age'] = pd.to_datetime('now') - df['Date']

print(df)

#    ID       Date                 Age
# 0   1 2000-02-03  6640 days 09:32:54
# 1   2 1990-06-30 10145 days 09:32:54
# 2   3 1995-05-12  8368 days 09:32:54
# 3   4 1985-12-31 11787 days 09:32:54

df['Age'] = df['Age'] /  np.timedelta64(1, 'Y')

print(df)

#    ID       Date        Age
# 0   1 2000-02-03  18.180796
# 1   2 1990-06-30  27.777160
# 2   3 1995-05-12  22.911899
# 3   4 1985-12-31  32.272803

Since age could be number of days from a certain datetime, number of seconds from a certain datetime or number of years from a certain datetime and it is unclear what you mean by age, let's assume you want the number of days and without loss of generality, let's say your start date is the string '2010-3-13' . 由于年龄可能是某个日期时间起的天数,某个日期时间起的秒数或某个日期时间起的年数,并且您不清楚年龄是什么意思,因此假设您想要的是天数,并且不失一般性,假设您的开始日期是字符串'2010-3-13' Here is how I would calculate it. 这是我的计算方法。 The main idea is to convert the string '2010-3-13' to a datetime object so that I can subtract it from today's date. 主要思想是将string '2010-3-13'转换为datetime对象,以便可以从今天的日期中减去它。

from datetime import datetime

numDays = (datetime.now() - datetime.strptime('2010-3-11', '%Y-%m-%d')).days
# the date of this post is '2018-10-3'

If I want to print the number of days, I would do: 如果要打印天数,我可以这样做:

>>> numDays
[out]    3128 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从熊猫中的出生日期计算年龄,Python(Jupyter Notebook) - How to Calculate Age from Date Of Birth in pandas, Python(Jupyter Notebook) 如何根据 Python Pandas 中的 PESEL 编号计算年龄? - How to calculate age based on PESEL number in Python Pandas? 根据python中年龄列的条件计算平均年龄 - Calculate average age based on condition on age column in python 当熊猫的某些行之间有NaN时,如何从出生日期计算年龄? - How to calculate the age from date of birth when there are NaN in between some of the rows in pandas? 计算 Pandas 数据框中的年龄 - Calculate age in a Pandas dataframe python pandas根据年龄选择值 - python pandas selecting values based on age python pandas 根据日期列计算天数 - python pandas calculate number of days based on Date column 如何在混合数据类型的 Python Pandas 数据框列中仅比较日期或仅忽略秒数的日期时间? - How to compare just the date or just date time ignoring seconds in a Python Pandas dataframe column of mixed data types? 根据 mongodb python 上的日期上传和过滤年龄 - upload and filter for an age based on date on mongodb python 如何从 Python 中的数据框的日期和生日获取年龄列表? - How to get list of age from date and birthday for a data frame in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM