简体   繁体   English

结合 Python 中的年月日创建日期

[英]Combine year, month and day in Python to create a date

I have a dataframe that consists of separate columns for year, month and day.我有一个 dataframe 由单独的年、月和日列组成。 I tried to combine these individual columns into one date using:我尝试使用以下方法将这些单独的列合并为一个日期:

df['myDt']=pd.to_datetime(df[['year','month','day']])

only to get the following error: "to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing".只得到以下错误:“组装映射至少需要指定 [年、月、日]:缺少 [日、月、年]”。 Not sure what this means....I'm already supplying the relevant columns.不知道这意味着什么......我已经提供了相关的专栏。 On checking the datatypes, I found that they Year, Month and Day columns are int64.在检查数据类型时,我发现它们的年、月和日列是 int64。 Would that be causing an issue?这会引起问题吗? Thanks, Chet谢谢,切特

Thank you all for posting.谢谢大家发帖。 As suggested, I'm posting the sample data set first: Value mm yy dd Date
2018-11-30 88.550067 11 2018 1 2018-12-31 88.906290 12 2018 1 2019-01-31 88.723000 1 2019 1 2019-02-28 89.509179 2 2019 1 2019-03-31 90.049161 3 2019 1 2019-04-30 90.523100 4 2019 1 2019-05-31 90.102484 5 2019 1 2019-06-30 91.179400 6 2019 1 2019-07-31 90.963570 7 2019 1 2019-08-31 92.159170 8 2019 1
按照建议,我首先发布示例数据集: Value mm yy dd Date
2018-11-30 88.550067 11 2018 1 2018-12-31 88.906290 12 2018 1 2019-01-31 88.723000 1 2019 1 2019-02-28 89.509179 2 2019 1 2019-03-31 90.049161 3 2019 1 2019-04-30 90.523100 4 2019 1 2019-05-31 90.102484 5 2019 1 2019-06-30 91.179400 6 2019 1 2019-07-31 90.963570 7 2019 1 2019-08-31 92.159170 8 2019 1
Value mm yy dd Date
2018-11-30 88.550067 11 2018 1 2018-12-31 88.906290 12 2018 1 2019-01-31 88.723000 1 2019 1 2019-02-28 89.509179 2 2019 1 2019-03-31 90.049161 3 2019 1 2019-04-30 90.523100 4 2019 1 2019-05-31 90.102484 5 2019 1 2019-06-30 91.179400 6 2019 1 2019-07-31 90.963570 7 2019 1 2019-08-31 92.159170 8 2019 1

The data source is: https://www.quandl.com/data/EIA/STEO_NGPRPUS_M I imported the data as follows: 1. import quandl (used conda install first) 2. Used Quandl's Python code:数据来源为: https://www.quandl.com/data/EIA/STEO_NGPRPUS_M我导入数据如下: 1.import quandl(先用conda install) 2.用quandl的Python代码:

data=quandl.get("EIA/STEO_NGPRPUS_M", authtoken="TOKEN","2005-01-01","2005-12-31") 4. Just to note, the original data comes only with the Value column, and DateTime as index. data=quandl.get("EIA/STEO_NGPRPUS_M", authtoken="TOKEN","2005-01-01","2005-12-31") 4. 需要注意的是,原始数据仅带有Value列,和 DateTime 作为索引。 I extracted and created the mm,yy and dd columns (month, year, and dd is a column vector set to 1) All I'm trying to do is create another column called "first of the month" - so for each day of each month, the column will just show "MM/YY/1".我提取并创建了 mm、yy 和 dd 列(月、年和 dd 是设置为 1 的列向量)我要做的只是创建另一个名为“每月第一天”的列 - 所以对于每个月,该列将只显示“MM/YY/1”。 I'm going to try out all the suggestions below shortly and get back to you guys.我将很快尝试以下所有建议并回复你们。 Thanks!!谢谢!!

Solution解决方案

You could use datetime.datetime along with .apply() .您可以将datetime.datetime.apply()一起使用。

import datetime

d = datetime.datetime(2020, 5, 17)
date = d.date()

For pandas.to_datetime(df)对于pandas.to_datetime(df)

It looks like your code is fine.看起来你的代码很好。 See pandas.to_datetime documentation and How to convert columns into one datetime column in pandas?请参阅pandas.to_datetime文档如何将列转换为 pandas 中的一个日期时间列? . .

df = pd.DataFrame({'year': [2015, 2016],
                   'month': [2, 3],
                   'day': [4, 5]})
pd.to_datetime(df[["year", "month", "day"]])

Output : Output

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

What if your YEAR, MONTH and DAY columns have different headers?如果您的 YEAR、MONTH 和 DAY 列有不同的标题怎么办?

Let's say your YEAR, MONTH and DAY columns are labeled as yy , mm and dd respectively.假设您的 YEAR、MONTH 和 DAY 列分别标记为yymmdd And you prefer to keep your column names unchanged.而且您更喜欢保持列名不变。 In that case you could do it as follows.在这种情况下,您可以按如下方式进行。

import pandas as pd

df = pd.DataFrame({'yy': [2015, 2016],
                   'mm': [2, 3],
                   'dd': [4, 5]})
df2 = df[["yy", "mm", "dd"]].copy()
df2.columns = ["year", "month", "day"]
pd.to_datetime(df2)

Output : Output

0   2015-02-04
1   2016-03-05
dtype: datetime64[ns]

You should use the apply method as follows:您应该按如下方式使用apply方法:

from datetime import datetime
df['myDt'] = df.apply(lambda row: datetime.strptime(f"{int(row.year)}-{int(row.month)}-{int(row.day)}", '%Y-%m-%d'), axis=1)

Running Example:运行示例:

>>> d = {'year': list(range(2015, 2020)), 'month': list(range(5, 10)), 'day': >> list(range(20, 25))}
>> df = pd.DataFrame(d)
>> df

    year    month   day myDt
0   2015    5       20  2015-05-20
1   2016    6       21  2016-06-21
2   2017    7       22  2017-07-22
3   2018    8       23  2018-08-23
4   2019    9       24  2019-09-24

Here is a two liner:这是一个两个班轮:

df['dateInt']=df['year'].astype(str) + df['month'].astype(str).str.zfill(2)+ df['day'].astype(str).str.zfill(2)
df['Date'] = pd.to_datetime(df['dateInt'], format='%Y%m%d')

Output Output

    year  month day dateInt     Date
0   2015    5   20  20150520    2015-05-20
1   2016    6   21  20160621    2016-06-21
2   2017    7   22  20170722    2017-07-22
3   2018    8   23  20180823    2018-08-23
4   2019    9   24  20190924    2019-09-24

#Add and calculate a new Calculated_Date column #添加并计算一个新的Calculated_Date列

df['Calculated_Date'] = df[['year', 'month', 'day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)

df['Calculated_Date'].head() df['Calculated_Date'].head()

#Parse your Calculated_Date column into a datetime obj (not needed; but if you need to parse) #将 Calculated_Date 列解析为日期时间 obj(不需要;但如果需要解析)

df['Calculated_Date'] = pd.to_datetime(df['Calculated_Date'])

df['Calculated_Date'].head() df['Calculated_Date'].head()

Improving the answer from @lmiguelvargasf, sometimes you want to save as datetime format.改进@lmiuelvargasf 的答案,有时您想保存为datetime时间格式。 Furthermore, using apply (IMHO) is better if other column is exist with some value (something like sales for the example).此外,如果存在具有某些值的其他列(例如,例如 sales),则使用apply (恕我直言)会更好。

import datetime

df['dt'] = df.apply(lambda row: datetime.datetime(int(row.yy),
                                                  int(row.mm),
                                                  int(row.dd)), axis=1)
df.head()

Note: my example only working if the yy value is in 2022 for example.注意:我的示例仅在yy值为2022时才有效。 If your yy value is 21 , you need to modify such as 2000 + int(row.yy) .如果您的yy值为21 ,则需要修改如2000 + int(row.yy)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM