I have 2 csv files with data given in this fashion. How can i perform a basic matching and produce a result like the output. I'm matching based on the websites field. that's the key i'm using here for matching.
I tried Efficiently find matching rows (based on content) in a pandas DataFrame and https://macxima.medium.com/python-retrieve-matching-rows-from-two-dataframes-d22ad9e71879
but i'm not getting my desired output. Any assistance would be helpful
file1.csv
| id | web_1 |
|----|------|
| 1 | google.com |
| 2 | microsoft.in |
| 3 | yahoo.uk |
| 4 | adobe.us |
file2.csv
| id | web_2 |
|----|-----|
|2| microsoft.in |
| 3 | yahoo.uk |
| 4 | adobe.us |
output
| id | web_1 | web_2 |
|----|------|--------|
| 1 | google.com | |
| 2 | microsoft.in | microsoft.in |
| 3 | yahoo.uk | yahoo.uk |
| 4 | adobe.us | adobe.us |
Based on your comment if you want to merge the dataframes in a way where the result only includes rows where the merge keys match you can do an inner join.
pandas.DataFrame.merge
uses 'inner'
as the default type of merge.
import pandas as pd
df1 = pd.DataFrame(
{
"id": [1, 2, 3, 4],
"web_1": ["google.com", "microsoft.in", "yahoo.uk", "adobe.us"],
}
)
df2 = pd.DataFrame(
{
"id": [2, 3, 4],
"web_2": ["microsoft.in", "yahoo.uk", "adobe.us"],
}
)
>>> pd.merge(df1, df2)
id web_1 web_2
0 2 microsoft.in microsoft.in
1 3 yahoo.uk yahoo.uk
2 4 adobe.us adobe.us
If you don't want to keep both web columns you can just drop one of them:
>>> pd.merge(df1, df2).drop(columns='web_2')
id web_1
0 2 microsoft.in
1 3 yahoo.uk
2 4 adobe.us
Drop and rename:
pd.merge(df1, df2).drop(columns='web_2').rename(columns={'web_1': 'web'})
id web
0 2 microsoft.in
1 3 yahoo.uk
2 4 adobe.us
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.