简体   繁体   中英

Find a partial string match from one excel column in another excel column using python/pandas

I have two excel spreadsheets loaded into 2 different dataframes. One column(System) in spreadsheet 1 has a system code which I need to match with another column(Description) in spreadsheet 2. The second column has a description which may or may not have system in it along with other string. If match is found I want to append spreadsheet 2 with a new column that has the system code.

df1 = pd.DataFrame(
    {
        "System": ["HFW", "SYS", "ABC"],
        "Description": ["HFW Description", "Sys Description", "ABC Description"],
    }
)

df2 = pd.DataFrame(
    {
        "Description": [
            "Amount spent for HFW",
            "Spending amt on XYZ",
            "INV20563BG",
            "ABC Cost",
            "ABC Cost 2",
        ],
        "Amount": ["150", "175", "160", "180", "100"],
    }
)

So basically need to metch 'System' column from DF1 to 'Description' in DF2. DF1 and DF2 could have more columns and different # of rows.

Tried these options:

df1["MatchingSystem"] = df1.System.apply(
    lambda x: 1 if df2["Description"].str.contains(x) else 0
)

Tried a few other things as well. Any help is appreciated

You can compare 2 list of strings and write the first match:

sys_values = df1.System.values

df2["MatchingSystem"] = df2.Description.apply(
    lambda x: next((sys for sys in sys_values if sys in x.split()), None)
)

The resulting dataframe df2 is:

            Description Amount MatchingSystem
0  Amount spent for HFW    150            HFW
1   Spending amt on XYZ    175           None
2            INV20563BG    160           None
3              ABC Cost    180            ABC
4            ABC Cost 2    100            ABC

I create the new column in the df2 dataframe because it seems like one df1.System can be in multiple df2.Description .

It's a bit dirty but I think it does the work. Let me know if you have any problem or question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM