简体   繁体   English

将一个 dataframe 中的一列与另一个 dataframe pandas 中的许多列进行比较

[英]Compare a column in one dataframe with many columns in another dataframe pandas

I have two dataframes:我有两个数据框:

df1: df1:

     ID       name1
0    ''    'company-1'
1    ''    'company2'
2    ''    'company 3'

df2: df2:

     ID      name2       name3        name4
0    '1'   'company1'  'company.1'  'company-1'
1    '2'   'company2'  'company.2'  'company-2'

I want to compare df1['name1'] to the name columns in df2 and put the ID in df2 in the ID column in df1.我想将 df1['name1'] 与 df2 中的名称列进行比较,并将 df2 中的 ID 放在 df1 中的 ID 列中。

I did this:我这样做了:

for i in range(len(df1)):
    for j in range(len(df2)):
        if df1.iloc[i]['name1'] == df2.iloc[j]['name2']:
            df1.iloc[i]['ID'] = df2.iloc[j]['ID']
            break
        elif df1.iloc[i]['name1'] == df2.iloc[j]['name3']:
            df1.iloc[i]['ID'] = df2.iloc[j]['ID']
            break
        elif df1.iloc[i]['name1'] == df2.iloc[j]['name4']:
            df1.iloc[i]['ID'] = df2.iloc[j]['ID']
            break
        else:
            df1[i]['ID'] = ''

Expected result would be:预期结果将是:

     ID       name1
0    '1'    'company-1'
1    '2'    'company2'
2    ''    'company 3'

It works, but it's extremely inneficient and takes up to hours.它可以工作,但效率极低,需要长达数小时。 Can you please help me?你能帮我么?

I'm sorry if the question doesn't meet the required criteria.如果问题不符合要求的标准,我很抱歉。 It's my first time posting here.这是我第一次在这里发帖。 I would love some tips regarding that also.我也喜欢一些关于这方面的建议。

This can be tackled in many ways.这可以通过多种方式解决。 You can use a row-wise apply , convert the second frame into a mapping/lookup table (Python dict ), or try joining the two frames.您可以使用逐行apply ,将第二帧转换为映射/查找表(Python dict ),或尝试连接两个帧。 Here's an example of the latter:这是后者的一个例子:

import pandas as pd

# The given input data
data_1 = {"ID": ["", "", ""], "name1": ["company-1", "company2", "company 3"]}
data_2 = {"ID"   : ["1", "2"], "name2": ["company1", "company2"], "name3": ["company.1", "company.2"],
          "name4": ["company-1", "company-2"]}

df_1 = pd.DataFrame(data_1)
df_2 = pd.DataFrame(data_2)

# Changing the second frame into "long format" and only keeping the "ID" and "potential_matches" variables
unpivoted: pd.DataFrame = df_2.melt("ID", value_name="potential_matches")[["ID", "potential_matches"]]

# Merging and tidyiing up
expected = (df_1
            .merge(unpivoted, how="left", left_on=["name1"], right_on=["potential_matches"])
            .drop(columns=["ID_x", "potential_matches"])
            .rename(columns={"ID_y": "ID"})[["ID", "name1"]])

print(expected)

If performance is still a problem, you can try matching name1 on a multi-index of name2, name3, name4 .如果性能仍然存在问题,您可以尝试在name2, name3, name4的多索引上匹配name1

Output Output

ID ID name1名称1
1 1 company-1公司-1
2 2 company2公司2
nan company 3公司 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将多列熊猫数据框与一列进行比较 - compare multiple columns of pandas dataframe with one column 将多个pandas数据帧列与另一个具有不同长度和索引的数据帧的一列进行比较 - compare multiple columns of pandas dataframe with one column of another dataframe having different length and index 将一个 dataframe 的两列与另一个 dataframe 的一列进行比较 - Compare two columns of one dataframe to one column of another dataframe Pandas 将 1 列值与另一个数据框列进行比较,找到匹配的行 - Pandas compare 1 columns values to another dataframe column, find matching rows 比较一列中的float值与pandas DataFrame中的所有其他列 - Compare float values in one column with all other columns in a pandas DataFrame 比较另一列 dataframe 中一列的值 dataframe - Compare values of one column of dataframe in another dataframe Pandas 将多个列与数据框中的特定列进行比较 - Pandas compare multiple columns to a specific column in a dataframe 将两个 Pandas 列与另一个 DataFrame 进行比较 - Compare two Pandas Columns with Another DataFrame Pandas:将列与数据帧的所有其他列进行比较 - Pandas: Compare a column to all other columns of a dataframe Pandas 数据框将列与一个值进行比较,并将此行和上一行放入另一个数据框中 - Pandas dataframe compare columns with one value and get this row and previous row into another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM