簡體   English   中英

Pandas 從日期獲取年齡(例如:出生日期)

[英]Pandas get the age from a date (example: date of birth)

如何計算一個人的年齡(基於 dob 列)並使用新值向 dataframe 添加一列?

dataframe 如下所示:

    lname      fname     dob
0    DOE       LAURIE    03011979
1    BOURNE    JASON     06111978
2    GRINCH    XMAS      12131988
3    DOE       JOHN      11121986

我嘗試執行以下操作:

now = datetime.now()
df1['age'] = now - df1['dob']

但是,收到以下錯誤:

TypeError: 不支持的操作數類型 -: 'datetime.datetime' 和 'str'

import datetime as DT
import io
import numpy as np
import pandas as pd

pd.options.mode.chained_assignment = 'warn'

content = '''     ssno        lname         fname    pos_title             ser  gender  dob 
0    23456789    PLILEY     JODY        BUDG ANAL             0560  F      031871 
1    987654321   NOEL       HEATHER     PRTG SRVCS SPECLST    1654  F      120852
2    234567891   SONJU      LAURIE      SUPVY CONTR SPECLST   1102  F      010999
3    345678912   MANNING    CYNTHIA     SOC SCNTST            0101  F      081692
4    456789123   NAUERTZ    ELIZABETH   OFF AUTOMATION ASST   0326  F      031387'''

df = pd.read_csv(io.StringIO(content), sep='\s{2,}')
df['dob'] = df['dob'].apply('{:06}'.format)

now = pd.Timestamp('now')
df['dob'] = pd.to_datetime(df['dob'], format='%m%d%y')    # 1
df['dob'] = df['dob'].where(df['dob'] < now, df['dob'] -  np.timedelta64(100, 'Y'))   # 2
df['age'] = (now - df['dob']).astype('<m8[Y]')    # 3
print(df)

產量

        ssno    lname      fname            pos_title   ser gender  \
0   23456789   PLILEY       JODY            BUDG ANAL   560      F   
1  987654321     NOEL    HEATHER   PRTG SRVCS SPECLST  1654      F   
2  234567891    SONJU     LAURIE  SUPVY CONTR SPECLST  1102      F   
3  345678912  MANNING    CYNTHIA           SOC SCNTST   101      F   
4  456789123  NAUERTZ  ELIZABETH  OFF AUTOMATION ASST   326      F   

                  dob  age  
0 1971-03-18 00:00:00   43  
1 1952-12-08 18:00:00   61  
2 1999-01-09 00:00:00   15  
3 1992-08-16 00:00:00   22  
4 1987-03-13 00:00:00   27  

  1. 看起來您的dob列當前是字符串。 首先,使用pd.to_datetime將它們轉換為Timestamps
  2. 格式'%m%d%y'將最后兩位數字轉換為年份,但不幸的是假設52表示 2052。由於那可能不是 Heather Noel 的出生年份,讓我們從dob減去 100 年,只要dob大於now 您可能希望在df['dob'] < now條件下減去幾年到now ,因為 101 歲的工人比 1 歲的工人更有可能......
  3. 您可以從now減去dob以獲得timedelta64[ns] 要將其轉換為年份,請使用astype('<m8[Y]')astype('timedelta64[Y]')

我找到了更簡單的解決方案:

import pandas as pd
from datetime import datetime
from datetime import date

d = {'col0': [1, 2, 6], 
     'col1': [3, 8, 3], 
     'col2': ['17.02.1979', '11.11.1993', '01.08.1961']}

df = pd.DataFrame(data=d)

def calculate_age(born):
    born = datetime.strptime(born, "%d.%m.%Y").date()
    today = date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

df['age'] = df['col6'].apply(calculate_age)
print(df)

輸出:

     col0  col1  col3        age
0       1     3  17.02.1979   39
1       2     8  11.11.1993   24
2       6     3  01.08.1961   57
# Data setup
df

    lname   fname        dob
0     DOE  LAURIE 1979-03-01
1  BOURNE   JASON 1978-06-11
2  GRINCH    XMAS 1988-12-13
3     DOE    JOHN 1986-11-12

# Make sure to parse all datetime columns in advance
df['dob'] = pd.to_datetime(df['dob'], errors='coerce')

如果您只想要年齡的年份部分,請使用@unutbu 的解決方案 . .

now = pd.to_datetime('now')
now
# Timestamp('2019-04-14 00:00:43.105892')

(now - df['dob']).astype('<m8[Y]') 

0    40.0
1    40.0
2    30.0
3    32.0
Name: dob, dtype: float64

另一種選擇是減去年份部分並使用

(now.year - df['dob'].dt.year) - ((now.month - df['dob'].dt.month) < 0)

0    40
1    40
2    30
3    32
Name: dob, dtype: int64

如果您想要(幾乎)精確的年齡(包括小數部分),請查詢total_seconds並進行除法。

(now - df['dob']).dt.total_seconds() / (60*60*24*365.25)

0    40.120446
1    40.840501
2    30.332630
3    32.418872
Name: dob, dtype: float64

第一個想法是你的年齡是兩位數,這在這個時代不是一個很好的選擇。 無論如何,我將假設像05這樣的所有年份實際上都是1905 這可能不正確(!)但是提出正確的規則將在很大程度上取決於您的數據。

from datetime import date

def age(date1, date2):
    naive_yrs = date2.year - date1.year
    if date1.replace(year=date2.year) > date2:
        correction = -1
    else:
        correction = 0
    return naive_yrs + correction

df1['age'] = df1['dob'].map(lambda x: age(date(int('19' + x[-2:]), int(x[:2]), int(x[2:-2])), date.today()))

當您嘗試從出生日期列與當前年份查找年齡時,請使用此襯里

import pandas as pd

df["dob"] = pd.to_datetime(data["dob"])

df["age"] = df["dob"].apply(lambda x : (pd.datetime.now().year - x.year))

下面的解決方案怎么樣:

import datetime as dt
import numpy as np
import pandas as pd
from dateutil.relativedelta import relativedelta

df1['age'] = [relativedelta(pd.to_datetime('now'), d).years for d in df1['dob']]
#once you have year, month and day part of DOB separated, you can use below given lines to get age in no. of years and months.

tmpdf = df[['born_year','born_month','born_day']].copy()
tmpdf.columns = ["year", "month", "day"]
df['dob']=pd.to_datetime(tmpdf , errors='coerce')
df['age_y']=(datetime.today()-df['dob']).dt.days/365.25
df['age_y']=df['age_y'].astype(int)
df['age_m']=((datetime.today()-df['dob']).dt.days/365.25 -  df['age_y'] ) * 12
df['age_m']=df['age_m'].astype(int)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM