简体   繁体   English

来自 to_datetime() 的奇怪行为

[英]Strange behavior from to_datetime()

I have really been having a tough time here.我在这里真的过得很艰难。

My DataFrame looks like this我的 DataFrame 看起来像这样

     Purchase_Date     Customer_ID  Gender  
0   2012-12-18 00:00:00   7223        F 
1   2012-12-20 00:00:00   7841        M     
2   2012-12-21 00:00:00   8374        F

My goal is to change the "Purchase Date" column from string to datetime object so that I can run a cohort analysis by applying this function to it:我的目标是将“购买日期”列从字符串更改为日期时间 object,以便我可以通过应用此 function 来运行群组分析:

      def get_month(x): return dt.datetime(x.year, x.month, 1)
      data['InvoiceMonth'] = data['Purchase_Date'].apply(get_month)
      grouping = data.groupby('Customer_ID')['InvoiceMonth']
      data['CohortMonth'] = grouping.transform('min')

the function returns error: 'str' object has no attribute 'year' I have tried the following functions and played with all arguments (dayfirst, yearfirst...) function 返回错误: 'str' object 没有属性 'year'我尝试了以下功能并使用了所有 arguments (dayfirst, yearfirst...)

data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])
pd.to_datetime()
datetime.datetime.strptime()

I keep getting ValueError: day is out of range for month我不断收到ValueError: day is out of range for month

Please help out请帮忙

So, you were almost there:所以,你几乎在那里:

data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])
data['InvoiceMonth'] = data["Purchase_Date"].dt.strftime("%Y-%m-01")

(Outputs month in object format - you can convert it to datetime by adding pd.to_datetime(...) ) (以object格式输出月份 - 您可以通过添加pd.to_datetime(...)将其转换为datetime时间)

Or alternatively - using your approach:或者 - 使用您的方法:

data["Purchase_Date"] = pd.to_datetime(data["Purchase_Date"])

import datetime as dt

def get_month(x): return dt.datetime(x.year, x.month, 1)

data['InvoiceMonth'] = data["Purchase_Date"].apply(get_month)

(Outputs month as datetime ) (输出月份为datetime

Both will return, although I would highly recommend the first option:两者都会返回,尽管我强烈推荐第一个选项:

  Purchase_Date  Customer_ID Gender InvoiceMonth
0    2012-12-18         7223      F   2012-12-01
1    2012-12-20         7841      M   2012-12-01
2    2012-12-21         8374      F   2012-12-01

The error is related to get_month because first you need to transform Purchase_Date to a datetime serie:该错误与get_month有关,因为首先您需要将Purchase_Date转换为日期时间系列:

import datetime as dt
data.Purchase_Date = pd.to_datetime(data.Purchase_Date, format='%Y-%m-%d %H:%M:%S')
data['Purchase_Date'].apply(get_month)

# 0   2012-12-01
# 1   2012-12-01
# 2   2012-12-01

You can also get the InvoiceMonth using MonthBegin so you don't have to declare get_month您还可以使用MonthBegin获取InvoiceMonth ,因此您不必声明get_month

from pd.tseries.offset import MonthBegin

data.Purchase_Date = pd.to_datetime(data.Purchase_Date, format='%Y-%m-%d %H:%M:%S')
data['InvoiceMonth'] = data.Purchase_Date - MonthBegin(1)

data['InvoiceMonth']
# 0   2012-12-01
# 1   2012-12-01
# 2   2012-12-01

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM