简体   繁体   中英

How to merge dataframes with multiple conditions/columns

Hi I have two main frames that I wanted to merge using columns Model, ID, Date&Time.

Here is the first dateframe(df1):

ProductName Model       Date&Time
Jugger      2_MXAA_33   2019-08-12 14:37:00
Memz        3_MXA1_44   2019-08-12 14:37:00

Second dataframe(df2):

Company    ID   Date&Time
A_Company   2   2019-08-12 14:39:00

Model and ID should match when the first number of Model is the same as ID. Here is the expected output:

ProductName Model       Date&Time            Company    ID
Jugger      2_MXAA_33   2019-08-12 14:37:00  A_Company  2

My current solution could only merge using date&time using merge_asof :

tol = pd.Timedelta('2 minute')
merged_df= pd.merge_asof(df1, df2.sort_values('Date&Time'), on='Date&Time', direction="nearest", tolerance=tol)

Could you please help on how to also merge using Model and ID columns together with Date&Time? Appreciate the advise on this. Thank you so much.

df1 = pd.DataFrame({"ProductName": ["Jugger", "Memz"],
                    "Model": ["2_MXAA_33", "3_MXA1_44"],
                    "Date&Time": ["2019-08-12 14:37:00", "2019-08-12 14:37:00"]})
df2= pd.DataFrame({"Company": ["A_Company"],
                    "ID": [2],
                    "Date&Time": ["2019-08-12 14:39:00"]})
df1['Date&Time'] = pd.to_datetime(df1['Date&Time'])
df2['Date&Time'] = pd.to_datetime(df2['Date&Time'])

I assume that the ID column for df1 is created by the first number given in Model , so create this column:

df1["ID"] = df1["Model"].str[0].astype(int)
df1
    ProductName Model       Date&Time               ID
0   Jugger      2_MXAA_33   2019-08-12  14:37:00    2
1   Memz        3_MXA1_44   2019-08-12  14:37:00    3

I'm not sure then how you know how to add Company to df1 , but then as @Mark Wang suggests, use by :

tol = pd.Timedelta('2 minute')
pd.merge_asof(df1, df2.sort_values('Date&Time'), on='Date&Time', by="ID", direction="nearest", tolerance=tol)

    ProductName Model       Date&Time           ID  Company
0   Jugger      2_MXAA_33   2019-08-12 14:37:00 2   A_Company
1   Memz        3_MXA1_44   2019-08-12 14:37:00 3   NaN

Your final merge will look like:

pd.merge_asof(df1, df2.sort_values('Date&Time'), on='Date&Time', by=['ID', 'Company'], direction="nearest", tolerance=tol)

but again, I'm not sure how to know the company for df1 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM