简体   繁体   English

查找匹配和不匹配的 recs panda

[英]Find matching and non matching recs panda

I have 2 csv files with data given in this fashion.我有 2 个 csv 文件,其中包含以这种方式给出的数据。 How can i perform a basic matching and produce a result like the output.我如何执行基本匹配并产生类似输出的结果。 I'm matching based on the websites field.我正在根据网站字段进行匹配。 that's the key i'm using here for matching.这是我在这里用于匹配的关键。

I tried Efficiently find matching rows (based on content) in a pandas DataFrame and https://macxima.medium.com/python-retrieve-matching-rows-from-two-dataframes-d22ad9e71879我尝试在 Pandas DataFramehttps://macxima.medium.com/python-retrieve-matching-rows-from-two-dataframes-d22ad9e71879 中有效地找到匹配的行(基于内容)

but i'm not getting my desired output.但我没有得到我想要的输出。 Any assistance would be helpful任何帮助都会有所帮助

file1.csv
| id | web_1  |
|----|------|
| 1  | google.com |
| 2  | microsoft.in |
| 3  | yahoo.uk |
| 4  | adobe.us |


file2.csv
| id | web_2 |
|----|-----|
|2| microsoft.in |
| 3  | yahoo.uk |
| 4  | adobe.us |


output 
| id | web_1  | web_2  |
|----|------|--------|
| 1  | google.com | |
| 2  | microsoft.in | microsoft.in |
| 3  | yahoo.uk | yahoo.uk |
| 4  | adobe.us | adobe.us |

Based on your comment if you want to merge the dataframes in a way where the result only includes rows where the merge keys match you can do an inner join.根据您的评论,如果您想以结果仅包含合并键匹配的行的方式合并数据帧,您可以进行内部联接。

pandas.DataFrame.merge uses 'inner' as the default type of merge. pandas.DataFrame.merge使用'inner'作为默认的合并类型。

import pandas as pd

df1 = pd.DataFrame(
    {
        "id": [1, 2, 3, 4],
        "web_1": ["google.com", "microsoft.in", "yahoo.uk", "adobe.us"],
    }
)
df2 = pd.DataFrame(
    {
        "id": [2, 3, 4],
        "web_2": ["microsoft.in", "yahoo.uk", "adobe.us"],
    }
)

>>>  pd.merge(df1, df2)
   id         web_1         web_2
0   2  microsoft.in  microsoft.in
1   3      yahoo.uk      yahoo.uk
2   4      adobe.us      adobe.us

If you don't want to keep both web columns you can just drop one of them:如果您不想保留两个 web 列,则可以删除其中一个:

>>> pd.merge(df1, df2).drop(columns='web_2')
   id         web_1
0   2  microsoft.in
1   3      yahoo.uk
2   4      adobe.us

Drop and rename:删除并重命名:

pd.merge(df1, df2).drop(columns='web_2').rename(columns={'web_1': 'web'})
   id           web
0   2  microsoft.in
1   3      yahoo.uk
2   4      adobe.us

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM