简体   繁体   English

转换字符串日期的 pandas 列以与 datetime.date 进行比较

[英]convert a pandas column of string dates to compare with datetime.date

I have a column of string values in pandas as follows: 2022-07-01 00:00:00+00:00我在 pandas 中有一列字符串值,如下所示: 2022-07-01 00:00:00+00:00

I want to compare it to a couple of dates as follows:我想将它与几个日期进行比较,如下所示:

month_start_date = datetime.date(start_year, start_month, 1)
month_end_date = datetime.date(start_year, start_month, calendar.monthrange(start_year, start_month)[1])
df = df[(df[date] >= month_start_date) and (df[date] <= month_end_date)]

How do i convert the string value to datetime.date?如何将字符串值转换为 datetime.date?

I have tried to use pd.to_datetime(df['date']) , says cant compare datetime to date我尝试使用pd.to_datetime(df['date']) ,说不能比较日期时间
Tried to use pd.to_datetime(df['date']).dt.date says dt can only be used for datetime l like variables, did you mean at试图使用pd.to_datetime(df['date']).dt.date说 dt 只能用于 datetime l 喜欢变量,你的意思是

Also tired to normalize it, but that bring more errors with timezone, and active and naive timezone也很累将其标准化,但这会带来更多的时区错误,以及活跃和幼稚的时区

Also tried.astype('datetime64[ns]')也试过.astype('datetime64[ns]')

None of it is working没有一个工作

UPDATE更新

Turns out none of the above are working because half the data is in this format: 2022-07-01 00:00:00+00:00事实证明上述方法都不起作用,因为一半的数据采用这种格式: 2022-07-01 00:00:00+00:00

And the rest is in this format: 2022-07-01 rest 是这种格式: 2022-07-01

Here is how i am getting around this issue:这是我解决这个问题的方法:

for index, row in df_uscis.iterrows():
    df_uscis.loc[index, 'date'] = datetime.datetime.strptime(row['date'].split(' ')[0], "%Y-%m-%d").date()

Is there a simpler and faster way of doing this?有没有更简单快捷的方法来做到这一点? I tried to make a new column with the date values only, but not sure how to do that我试图创建一个仅包含日期值的新列,但不知道该怎么做

From your update, if you only need to turn the values from string to date objects, you can try:从您的更新中,如果您只需要将值从字符串转换为日期对象,您可以尝试:

df['date'] = pd.to_datetime(df['date'].str.split(' ').str[0])
df['date'] = df['date'].dt.date

Also, try to avoid using iterrows, as it is really slow and usually there's a better way to achieve what you're trying to acomplish, but if you really need to iterate through a DataFrame, try using the df.itertuples() method.另外,尽量避免使用 iterrows,因为它真的很慢,通常有更好的方法来实现你想要完成的目标,但如果你真的需要遍历 DataFrame,请尝试使用df.itertuples()方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM