简体   繁体   English

如何在混合数据类型的 Python Pandas 数据框列中仅比较日期或仅忽略秒数的日期时间?

[英]How to compare just the date or just date time ignoring seconds in a Python Pandas dataframe column of mixed data types?

In a pandas dataframe, I have a column of mixed data types, such as text, integers and datetimes.在 pandas 数据框中,我有一列混合数据类型,例如文本、整数和日期时间。 I need to find columns where datetimes match: (1) exact values in some cases, (2) only the date (ignoring time), or (3) only the date and time, but ignoring seconds.我需要找到日期时间匹配的列:(1)某些情况下的精确值,(2)仅日期(忽略时间),或(3)仅日期和时间,但忽略秒。

In the following code example with a mixed data type dataframe column, there are three dates of varying imprecision.在以下具有混合数据类型数据框列的代码示例中,存在三个不精确的日期。 Mapping the conditions into a separate dataframe works for a precise value.将条件映射到单独的数据框中可以得到精确的值。

import pandas as pd
import numpy as np
# example data frame
inp = [{'Id': 0, 'mixCol': np.nan},
       {'Id': 1, 'mixCol': "text"},
       {'Id': 2, 'mixCol': 43831},
       {'Id': 3, 'mixCol': pd.to_datetime("2020-01-01 00:00:00")}, 
       {'Id': 4, 'mixCol': pd.to_datetime("2020-01-01 01:01:00")},
       {'Id': 5, 'mixCol': pd.to_datetime("2020-01-01 01:01:01")}
       ]
df = pd.DataFrame(inp)
print(df.dtypes)

myMap = pd.DataFrame()
myMap["Exact"] = df["mixCol"] == pd.to_datetime("2020-01-01 01:01:01")

0   False
1   False
2   False
3   False
4   False
5   True
6   False

The output I need should be:我需要的输出应该是:

Id   Exact    DateOnly    NoSeconds
0    False    False       False
1    False    False       False
2    False    False       False
3    False    True        False
0    False    True        True 
5    True     True        True 
6    False    False       False 

BUT, mapping just the date, without time, maps as if the date had a time of 00:00:00.但是,只映射日期而不是时间,映射的日期好像是 00:00:00 的时间。

myMap["DateOnly"] = df["mixCol"] == pd.to_datetime("2020-01-01")

Id   Exact    DateOnly
0    False    False   
1    False    False  
2    False    False  
3    False    True   
0    False    False  
5    True     False  
6    False    False  

Trying to convert values in the mixed column throws an AttributeError: 'Series' object has not attribute 'date';尝试转换混合列中的值会引发 AttributeError: 'Series' object has not attribute 'date'; and trying to use ">" and "<" to define the relevant range throws a TypeError: '>=' not supported between instances of 'str' and 'Timestamp'并尝试使用“>”和“<”来定义相关范围会引发 TypeError: '>=' not supported between 'str' and 'Timestamp' instances

myMap["DateOnly"] = df["mixCol"].date == pd.to_datetime("2020-01-01")
myMap["NoSeconds"] = (df["mixCol"] >= pd.to_datetime("2020-01-01 01:01:00")) & (df["mixCol"] < pd.to_datetime("2020-01-01 01:02:00"))

If I try to follow the solution for mix columns in pandas proposed here , both the np.nan and text value map true as dates.如果我尝试遵循此处提出的 pandas 中混合列的解决方案,则 np.nan 和文本值都映射为日期。

df["IsDate"] = df.apply(pd.to_datetime, errors='coerce',axis=1).nunique(1).eq(1).map({True:True ,False:False})

I'm not sure how to proceed in this situation?我不确定在这种情况下如何进行?

Use Series.dt.normalize for compare datetimes with remove times (set them to 00:00:00 ) or with Series.dt.floor by days or minutes for remove seconds:使用Series.dt.normalize比较日期时间与删除时间(将它们设置为00:00:00 )或使用Series.dt.floor按天或分钟进行删除秒数:

#convert column to all datetimes with NaT
d = pd.to_datetime(df["mixCol"], errors='coerce')
myMap["DateOnly"] = d.dt.normalize() == pd.to_datetime("2020-01-01")
myMap["DateOnly"] = d.dt.floor('D') == pd.to_datetime("2020-01-01")

#alternative with dates
myMap["DateOnly"] = d.dt.date == pd.to_datetime("2020-01-01").date()

myMap['NoSeconds'] = d.dt.floor('Min') == pd.to_datetime("2020-01-01 01:01:00")

print (myMap)
   Exact  DateOnly  NoSeconds
0  False     False      False
1  False     False      False
2  False     False      False
3  False      True      False
4  False      True       True
5   True      True       True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM