检查三列中的任何一列是否在列日期范围内

Question

I have a DataFrame containing three datetime columns: 我有一个包含三个datetime列的DataFrame：

tp.loc[:, ['Arrival1', 'Arrival2', 'Departure']].head()

        Arrival1            Arrival2           Departure
0 2018-11-26 05:45:00 2018-11-26 12:00:00 2018-1-26 08:00:00
1 2018-11-26 22:00:00 2018-11-27 00:00:00 2018-11-26 23:00:00
2 2018-11-26 05:45:00 2018-11-26 08:15:00 2018-11-26 06:45:00
3 2018-11-26 07:30:00 2018-11-26 10:15:00 2018-11-26 08:30:00
4 2018-12-02 07:30:00 2018-12-02 21:30:00 2018-12-02 08:00:00

I want to get only the rows of tp whose Arrival 1, Arrival 2 or Departure (any of the three) are within the following column ranges (any of the rows): 我只想获取其到达1，到达2或出发（三个中的任何一个）在以下列范围内（任何行）的tp行：

db.loc[db['country'] == 'AT']

country        banStartDate          banEndDate
102      AT 2018-12-01 14:00:00 2018-12-01 22:59:00
161      AT 2018-12-01 23:00:00 2018-12-02 21:00:00
51       AT 2018-12-07 23:00:00 2018-12-08 22:59:00

In this example, I want only row #4 to be retrieved from tp since Arrival2 is within the date range of db. 在此示例中，由于Arrival2在db的日期范围内，我只希望从tp检索第4行。

Is there an easy way to do so? 有一个简单的方法吗？

Answer 1

After reading in your dataframes with pd.read_csv() , you can use pd.concat() with a boolean mask and list comprehension, followed by drop_duplicates() : 在使用pd.read_csv()读取数据帧之后，可以将pd.concat()与布尔掩码和列表理解一起使用，然后是drop_duplicates() ：

from io import StringIO
import pandas as pd

df1 = StringIO('''
            Arrival1            Arrival2           Departure
0  2018-11-26 05:45:00  2018-11-26 12:00:00  2018-1-26 08:00:00
1  2018-11-26 22:00:00  2018-11-27 00:00:00  2018-11-26 23:00:00
2  2018-11-26 05:45:00  2018-11-26 08:15:00  2018-11-26 06:45:00
3  2018-11-26 07:30:00  2018-11-26 10:15:00  2018-11-26 08:30:00
4  2018-12-02 07:30:00  2018-12-02 21:30:00  2018-12-02 08:00:00
''')

df2 = StringIO('''
    country        banStartDate          banEndDate
102      AT  2018-12-01 14:00:00  2018-12-01 22:59:00
161      AT  2018-12-01 23:00:00  2018-12-02 21:00:00
51       AT  2018-12-07 23:00:00  2018-12-08 22:59:00
''')

tp = pd.read_csv(df1, sep=r'\s{2,}', engine='python', parse_dates=[0,1,2])
db = pd.read_csv(df2, sep=r'\s{2,}', engine='python', parse_dates=[1,2]).reset_index()

pd.concat([tp.loc[((tp>db.loc[i,'banStartDate']) & (tp<db.loc[i,'banEndDate'])).any(axis=1)] for i in range(db.shape[0])]).drop_duplicates()

Returns: 返回：

             Arrival1            Arrival2           Departure
4 2018-12-02 07:30:00 2018-12-02 21:30:00 2018-12-02 08:00:00

Answer 2

You can use the pandas.DataFrame.any with axis = 'row'(or 1) to find where the dates are between start and end. 您可以将pandas.DataFrame.any与axis ='row'（或1）一起使用，以查找日期在开始和结束之间的位置。 You will need 3 of these or a for loop for however many 'country' column of db there are. 无论数据库中有多少“国家”列，您都需要其中3个或一个for循环。

Also, I believe(I could be wrong) you will need to convert those strings into python datetime variables. 另外，我相信（我可能错了），您将需要将这些字符串转换为python datetime变量。 The code would look similar to this; 该代码看起来与此类似；

tp[(datetime.strptime(Start_Date, '%Y-%d-%m %H:%M:%S')> tp >datetime.strptime(End_Date, '%Y-%d-%m %H:%M:%S')).any(axis=1)]

检查三列中的任何一列是否在列日期范围内

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-11-29 16:02:56

解决方案2
1 2018-11-29 15:54:50

检查三列中的任何一列是否在列日期范围内

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-11-29 16:02:56

解决方案2 1 2018-11-29 15:54:50

解决方案1
2 已采纳 2018-11-29 16:02:56

解决方案2
1 2018-11-29 15:54:50