简体   繁体   English

在特定日期范围内查找两个数据框之间的公共数据

[英]find common data between two dataframes on a specific range of date

I have two dataframes df1 and df2 based, respectively, on these dictionaries:我有两个数据帧 df1 和 df2 分别基于这些字典:

data1 = {'date':  ['5/09/22', '7/09/22', '7/09/22','10/09/22'],
            'second_column': ['first_value', 'second_value', 'third_value','fourth_value'],
             'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXzadzd589'],
            'fourth_column':['first_value', 'second_value', 'third_value','fourth_value'],}
    
data2 = {'date':  ['5/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '11/09/22'],
            'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
             'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'],
            'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],}

I want to compare df2 with df1 in order to show the "id_number" of df2 that are in df1.我想将 df2 与 df1 进行比较,以显示 df1 中 df2 的“id_number”。

I also want to compare the two dataframes on the same date range.我还想比较同一日期范围内的两个数据框。

For example the shared date range between df1 and df2 should be the from 5/09/22 to 10/09/22 (and not beyond)例如,df1 和 df2 之间的共享日期范围应该是从 5/09/22 到 10/09/22(并且不能超过)

How can I do this?我怎样才能做到这一点?

You can define a helper function to make dataframes of your dictionaries and slice them on certain date range:您可以定义一个助手 function 来制作字典的数据框并在特定日期范围内对其进行切片:

def format(dictionary, start, end):
    """Helper function.

    Args:
        dictionary: dictionary to format.
        start: start date (DD/MM/YY).
        end: end date (DD/MM/YY).

    Returns:
        Dataframe.

    """
    return (
        pd.DataFrame(dictionary)
        .pipe(lambda df_: df_.assign(date=pd.to_datetime(df_["date"], format="%d/%m/%y")))
        .pipe(
            lambda df_: df_.loc[
                (df_["date"] >= pd.to_datetime(start, format="%d/%m/%y"))
                & (df_["date"] <= pd.to_datetime(end, format="%d/%m/%y")),
                :,
            ]
        ).reset_index(drop=True)
    )

Then, with dictionaries you provided, here is how you can "show the "id_number" of df2 that are in df1" for the desired date range:然后,使用您提供的字典,您可以在所需的日期范围内“显示 df1 中的 df2 的“id_number””

df1 = format(data1, "05/09/22", "10/09/22")
df2 = format(data2, "05/09/22", "10/09/22")

print(df2[df2["id_number"].isin(df1["id_number"])]["id_number"])
# Output
0      AA576bdk89
1    GG6jabkhd589
2     BXV6jabd589
Name: id_number, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM