[英]converting year to year-month-day python pandas csv
Please I have close to 10 entries in a csv file as follow 请在csv文件中有接近10个条目,如下所示
PatienceID Case Treatment Admitted_Date Discharged_Date
PAT1002 Fever Yes 1929-02-10 1929-02-13
PAT1023 Ebola Yes 2015-10-21 2015-12-29
PAT1003 HIV No 2012 2014-02-21
PAT1991 Headache Yes 2013 2013
PAT2028 Epilepsy Yes 2011 2016
PAT2931 Malaria Yes 2016-01-23 2016
Please if we study the csv, there are values under Admitted_Date
and/or Discharged_Date
which only have a year without Month-Day . 请注意,如果我们研究csv,则
Admitted_Date
和/或Discharged_Date
下的值只有一年而没有Month-Day 。 I don't know how to complete date with Month-DAY (Ie so that Discharge_Date
will precede the admitted eg let's consider Admitted_Date = 2013
and Discharged_Date = 2013
, if Admitted_Date
has 01-01-2013
then the Discharged_Date
should have 12-12-2013
(January to December) 我不知道如何用Month-DAY完成日期(即,让
Discharge_Date
在允许日期之前进行,例如,让我们考虑Admitted_Date = 2013
和Discharged_Date = 2013
,如果Admitted_Date
为01-01-2013
那么Discharged_Date
应该为12-12-2013
(1月至12月)
I have tried out possibilities but it gets messier. 我已经尝试了可能性,但它变得更加混乱。 I appreciate, thank so much.
非常感谢,非常感谢。
Expected output: 预期产量:
PatienceID Case Treatment Admitted_Date Discharged_Date
PAT1002 Fever Yes 1929-02-10 1929-02-13
PAT1023 Ebola Yes 2015-10-21 2015-12-29
PAT1003 HIV No 2012-MM-DD 2014-02-21
PAT1991 Headache Yes 2013-MM-DD 2013-MM-DD
PAT2028 Epilepsy Yes 2011-MM-DD 2016-MM-DD
PAT2931 Malaria Yes 2016-01-23 2016-MM-DD
What I have tried so far 到目前为止我尝试过的
import pandas as pd
DF = pd.read_csv('mydata.csv')
for Admitted_Date, Discharged_Date in DF
pd.to_datetime(mydata.pop('Date'), format="%b%Y")
IIUC you can try first convert columns to_datetime
and then use YearEnd
, where length of original column is 4
by mask
and loc
: IIUC,您可以尝试首先将列转换为
to_datetime
,然后使用YearEnd
,其中mask
和loc
的原始列长度为4
:
df['Admitted_Date'] = pd.to_datetime(df['Admitted_Date'])
mask = df['Discharged_Date'].str.len() == 4
print mask
0 False
1 False
2 False
3 True
4 True
5 True
Name: Discharged_Date, dtype: bool
df['Discharged_Date'] = pd.to_datetime(df['Discharged_Date'])
df.loc[mask, 'Discharged_Date' ] += pd.offsets.YearEnd()
print df
PatienceID Case Treatment Admitted_Date Discharged_Date
0 PAT1002 Fever Yes 1929-02-10 1929-02-13
1 PAT1023 Ebola Yes 2015-10-21 2015-12-29
2 PAT1003 HIV No 2012-01-01 2014-02-21
3 PAT1991 Headache Yes 2013-01-01 2013-12-31
4 PAT2028 Epilepsy Yes 2011-01-01 2016-12-31
5 PAT2931 Malaria Yes 2016-01-23 2016-12-31
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.