[英]How to combine year, month, and day columns to single datetime column?
I have the following dataframe df
:我有以下数据框
df
:
id lat lon year month day
0 381 53.30660 -0.54649 2004 1 2
1 381 53.30660 -0.54649 2004 1 3
2 381 53.30660 -0.54649 2004 1 4
and I want to create a new column df['Date']
where the year
, month
, and day
columns are combined according to the format yyyy-md
.我想创建一个新列
df['Date']
,其中year
、 month
和day
列根据格式yyyy-md
。
Following this post , I did:在这篇文章之后,我做了:
`df['Date']=pd.to_datetime(df['year']*10000000000
+df['month']*100000000
+df['day']*1000000,
format='%Y-%m-%d%')`
The result is not what I expected, as it starts from 1970 instead of 2004, and it also contains the hour stamp, which I did not specify:结果不是我所期望的,因为它是从 1970 年而不是 2004 年开始的,并且它还包含我没有指定的小时戳:
id lat lon year month day Date
0 381 53.30660 -0.54649 2004 1 2 1970-01-01 05:34:00.102
1 381 53.30660 -0.54649 2004 1 3 1970-01-01 05:34:00.103
2 381 53.30660 -0.54649 2004 1 4 1970-01-01 05:34:00.104
As the dates should be in the 2004-1-2
format, what am I doing wrong?由于日期应该是
2004-1-2
格式,我做错了什么?
There is an easier way:有一个更简单的方法:
In [250]: df['Date']=pd.to_datetime(df[['year','month','day']])
In [251]: df
Out[251]:
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04
Assembling a datetime from multiple columns of a DataFrame.
从 DataFrame 的多列组装日期时间。 The keys can be common abbreviations like [
year
,month
,day
,minute
,second
,ms
,us
,ns
]) or plurals of the same键可以是常见的缩写,如 [
year
、month
、day
、minute
、second
、ms
、us
、ns
]) 或相同的复数形式
One solution would be to convert these columns to string, concatenate using agg
+ str.join
, and then convert to datetime
.一种解决方案是将这些列转换为字符串,使用
agg
+ str.join
连接,然后转换为datetime
。
df['Date'] = pd.to_datetime(
df[['year', 'month', 'day']].astype(str).agg('-'.join, axis=1))
df
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04
You may also want to add an errors='coerce'
argument if you have invalid datetime combinations between your columns.如果列之间的日期时间组合无效,您可能还想添加一个
errors='coerce'
参数。
To fix your code修复您的代码
df['Date']=pd.to_datetime(df.year*10000+df.month*100+df.day,format='%Y%m%d')
df
Out[57]:
id lat lon year month day Date
0 381 53.3066 -0.54649 2004 1 2 2004-01-02
1 381 53.3066 -0.54649 2004 1 3 2004-01-03
2 381 53.3066 -0.54649 2004 1 4 2004-01-04
I struggled to find a solution because I was working with a dataset with columns in Spanish.我努力寻找解决方案,因为我正在处理一个包含西班牙语列的数据集。 As soon as I translated them to "year" "month" and "day" and "hour", the conversion worked perfectl
一旦我将它们翻译成“年”“月”“日”和“小时”,转换就完美了
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.