简体   繁体   中英

Check if value from one dataframe exists in another dataframe and create column

I am looking to compare values of columns in two different datasets and create a column with the results that have matched.

DF1:

 Dates
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04
4   2021-10-05
5   2021-10-06
6   2021-10-07

DF2 =

StartDate    User
2021-10-04   Doe, John
2021-10-07   Ann, Mary
2021-10-07   Doe, John

Expected Result

DF1:

 Dates            User
0   2021-10-01
1   2021-10-02
2   2021-10-03
3   2021-10-04    Doe, John
4   2021-10-05
5   2021-10-06
6   2021-10-07    Ann, Mary; Doe, John

I can see the matches using the following code:

df1= df2.loc[df2['StartDate'].isin(df1['Dates']), 'User']

And i can return a 1/0 with the following:

df1= df1.assign(result=df1['Dates'].isin(df2['StartDate']).astype(int))

However, i cannot seem to merge them both

df.join(df2.groupby('StartDate')['User'].apply('; '.join), how='left', on='Dates').fillna('')`

Output:

>>> df
       Dates          User
0 2021-10-01              
1 2021-10-02              
2 2021-10-03              
3 2021-10-04         Test1
4 2021-10-05              
5 2021-10-06              
6 2021-10-07  Test2, Test1

You want to do 2 things:

  1. Combine the DF2 results on the date
  2. Join or merge DF2 with DF1 on the date

Start with this answer to aggregate your DF2 into the desired form (ie, to have Test2, Test 1 in col User):

How to combine multiple rows into a single row with pandas

Take the result, make sure it's a Dataframe, and use join (or merge!) to now combine things appropriately following the example given here:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM