简体   繁体   中英

converting year to year-month-day python pandas csv

Please I have close to 10 entries in a csv file as follow

PatienceID      Case        Treatment     Admitted_Date      Discharged_Date
PAT1002         Fever        Yes           1929-02-10         1929-02-13
PAT1023         Ebola        Yes           2015-10-21         2015-12-29
PAT1003         HIV          No            2012               2014-02-21
PAT1991         Headache     Yes           2013               2013
PAT2028         Epilepsy     Yes           2011               2016
PAT2931         Malaria      Yes           2016-01-23         2016

Please if we study the csv, there are values under Admitted_Date and/or Discharged_Date which only have a year without Month-Day . I don't know how to complete date with Month-DAY (Ie so that Discharge_Date will precede the admitted eg let's consider Admitted_Date = 2013 and Discharged_Date = 2013 , if Admitted_Date has 01-01-2013 then the Discharged_Date should have 12-12-2013 (January to December)

I have tried out possibilities but it gets messier. I appreciate, thank so much.

Expected output:

PatienceID      Case        Treatment     Admitted_Date      Discharged_Date
    PAT1002         Fever        Yes        1929-02-10         1929-02-13
    PAT1023         Ebola        Yes        2015-10-21         2015-12-29
    PAT1003         HIV          No         2012-MM-DD       2014-02-21
    PAT1991         Headache     Yes        2013-MM-DD         2013-MM-DD
    PAT2028         Epilepsy     Yes        2011-MM-DD         2016-MM-DD
    PAT2931         Malaria      Yes        2016-01-23         2016-MM-DD

What I have tried so far

import pandas as pd

DF = pd.read_csv('mydata.csv') 
for Admitted_Date, Discharged_Date in DF
  pd.to_datetime(mydata.pop('Date'), format="%b%Y")

IIUC you can try first convert columns to_datetime and then use YearEnd , where length of original column is 4 by mask and loc :

df['Admitted_Date'] = pd.to_datetime(df['Admitted_Date'])
mask = df['Discharged_Date'].str.len() == 4
print mask
0    False
1    False
2    False
3     True
4     True
5     True
Name: Discharged_Date, dtype: bool

df['Discharged_Date'] = pd.to_datetime(df['Discharged_Date'])
df.loc[mask, 'Discharged_Date' ] +=  pd.offsets.YearEnd()
print df
  PatienceID      Case Treatment Admitted_Date Discharged_Date
0    PAT1002     Fever       Yes    1929-02-10      1929-02-13
1    PAT1023     Ebola       Yes    2015-10-21      2015-12-29
2    PAT1003       HIV        No    2012-01-01      2014-02-21
3    PAT1991  Headache       Yes    2013-01-01      2013-12-31
4    PAT2028  Epilepsy       Yes    2011-01-01      2016-12-31
5    PAT2931   Malaria       Yes    2016-01-23      2016-12-31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM