简体   繁体   English

DataFrame:比较两个不同列的日期

[英]DataFrame : Compare dates from two different columns

Compare dates from different columns in a same day.比较同一天不同列的日期。

df df

            a                           b      
    0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999
    1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999
    2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999
    3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999

Expected Output预计 Output

            a                           b                       output
    0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999       True
    1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999       False
    2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999       True
    3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999       True

Should i convert dates to string(strf date) and compare them or any other way?我应该将日期转换为字符串(strf date)并比较它们还是其他方式?

Convert the datetime to datetime objects either by using pd.to_datetime or while reading from csv.通过使用pd.to_datetime或从 csv 读取将日期时间转换为日期时间对象。 Then use dt.date function to compare the dates然后使用dt.date function 比较日期

In [22]: df = pd.read_csv("a.csv", parse_dates=["a","b"])

In [23]: df
Out[23]:
                        a                       b
0 2020-07-17 00:00:01.999 2020-07-17 12:00:01.999
1 2020-06-15 13:14:01.999 2020-02-14 12:00:01.999
2 2020-09-05 16:14:01.999 2020-09-05 11:59:01.999
3 2020-11-17 23:14:01.999 2020-11-17 05:30:01.999

In [24]: df["c"] = df["a"].dt.date == df["b"].dt.date

In [25]: df
Out[25]:
                        a                       b      c
0 2020-07-17 00:00:01.999 2020-07-17 12:00:01.999   True
1 2020-06-15 13:14:01.999 2020-02-14 12:00:01.999  False
2 2020-09-05 16:14:01.999 2020-09-05 11:59:01.999   True
3 2020-11-17 23:14:01.999 2020-11-17 05:30:01.999   True

What you have is timestamp, and to get date out of it you should use.date() method, assuming the dataframe is df.您拥有的是时间戳,要从中获取日期,您应该使用.date() 方法,假设 dataframe 是 df。

df['output'] = df.apply(lambda row: row['a'].date() == row['b'].date(), axis=1)

If columns 'a' and 'b' are strings use如果列 'a' 和 'b' 是字符串,请使用

df['output'] = df.apply(lambda row: pd.Timestamp(row['a']).date() == pd.Timestamp(row['b']).date(), axis=1)

You should first convert your columns into datetime columns using pd.to_datetime like below:您应该首先使用pd.to_datetime将列转换为datetime时间列,如下所示:

df['a'] = pd.to_datetime(df['a'])
df['b'] = pd.to_datetime(df['b'])

Now, use np.where to create a new column while comparing just dates:现在,使用np.where创建一个新列,同时只比较日期:

import numpy as np
df['output'] = np.where(df['a'].dt.date == df['b'].dt.date, True, False)

Output: Output:

    a                           b                           output
0   2020-07-17 00:00:01.999    2020-07-17 12:00:01.999       True
1   2020-06-15 13:14:01.999    2020-02-14 12:00:01.999       False
2   2020-09-05 16:14:01.999    2020-09-05 11:59:01.999       True
3   2020-11-17 23:14:01.999    2020-11-17 05:30:01.999       True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM