简体   繁体   中英

Find where three separate DataFrames overlap and create a new DataFrame

I have three separate DataFrames. Each DataFrame has the same columns - ['Email', 'Rating'] . There are duplicate row values in all three DataFrames for the column Email . I'm trying to find those emails that appear in all three DataFrames and then create a new DataFrame based off those rows. So far I have I had all three DataFrames saved to a list like this dfs = [df1, df2, df3] , and then concatenated them together using df = pd.concat(dfs) . I tried using groupby from here but to no avail. Any help would be greatly appreciated

You want to do a merge. Similar to a join in sql you can do an inner merge and treat the email like a foreign key. Here is the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html It would look something like this:

in_common = pd.merge(df1, df2, on=['Email'], how='inner')

you could try using .isin from pandas, eg:

df[df['Email'].isin(df2['Email'])]

This would retrieve row entries where the values for the column email are the same in the two dataframes.

Another idea is maybe try an inner merge.

Goodluck, post code next time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM