简体   繁体   中英

Find matching and non matching recs panda

I have 2 csv files with data given in this fashion. How can i perform a basic matching and produce a result like the output. I'm matching based on the websites field. that's the key i'm using here for matching.

I tried Efficiently find matching rows (based on content) in a pandas DataFrame and https://macxima.medium.com/python-retrieve-matching-rows-from-two-dataframes-d22ad9e71879

but i'm not getting my desired output. Any assistance would be helpful

file1.csv
| id | web_1  |
|----|------|
| 1  | google.com |
| 2  | microsoft.in |
| 3  | yahoo.uk |
| 4  | adobe.us |


file2.csv
| id | web_2 |
|----|-----|
|2| microsoft.in |
| 3  | yahoo.uk |
| 4  | adobe.us |


output 
| id | web_1  | web_2  |
|----|------|--------|
| 1  | google.com | |
| 2  | microsoft.in | microsoft.in |
| 3  | yahoo.uk | yahoo.uk |
| 4  | adobe.us | adobe.us |

Based on your comment if you want to merge the dataframes in a way where the result only includes rows where the merge keys match you can do an inner join.

pandas.DataFrame.merge uses 'inner' as the default type of merge.

import pandas as pd

df1 = pd.DataFrame(
    {
        "id": [1, 2, 3, 4],
        "web_1": ["google.com", "microsoft.in", "yahoo.uk", "adobe.us"],
    }
)
df2 = pd.DataFrame(
    {
        "id": [2, 3, 4],
        "web_2": ["microsoft.in", "yahoo.uk", "adobe.us"],
    }
)

>>>  pd.merge(df1, df2)
   id         web_1         web_2
0   2  microsoft.in  microsoft.in
1   3      yahoo.uk      yahoo.uk
2   4      adobe.us      adobe.us

If you don't want to keep both web columns you can just drop one of them:

>>> pd.merge(df1, df2).drop(columns='web_2')
   id         web_1
0   2  microsoft.in
1   3      yahoo.uk
2   4      adobe.us

Drop and rename:

pd.merge(df1, df2).drop(columns='web_2').rename(columns={'web_1': 'web'})
   id           web
0   2  microsoft.in
1   3      yahoo.uk
2   4      adobe.us

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM