简体   繁体   English

使用 python/pandas 从另一个 excel 列中的一个 excel 列中查找部分字符串匹配

[英]Find a partial string match from one excel column in another excel column using python/pandas

I have two excel spreadsheets loaded into 2 different dataframes.我有两个 excel 电子表格加载到 2 个不同的数据帧中。 One column(System) in spreadsheet 1 has a system code which I need to match with another column(Description) in spreadsheet 2. The second column has a description which may or may not have system in it along with other string.电子表格 1 中的一列(系统)有一个系统代码,我需要与电子表格 2 中的另一列(描述)相匹配。第二列有一个描述,其中可能有也可能没有系统以及其他字符串。 If match is found I want to append spreadsheet 2 with a new column that has the system code.如果找到匹配项,我想在 append 电子表格 2 中添加一个包含系统代码的新列。

df1 = pd.DataFrame(
    {
        "System": ["HFW", "SYS", "ABC"],
        "Description": ["HFW Description", "Sys Description", "ABC Description"],
    }
)

df2 = pd.DataFrame(
    {
        "Description": [
            "Amount spent for HFW",
            "Spending amt on XYZ",
            "INV20563BG",
            "ABC Cost",
            "ABC Cost 2",
        ],
        "Amount": ["150", "175", "160", "180", "100"],
    }
)

So basically need to metch 'System' column from DF1 to 'Description' in DF2.所以基本上需要将 DF1 中的“系统”列匹配到 DF2 中的“描述”。 DF1 and DF2 could have more columns and different # of rows. DF1 和 DF2 可以有更多的列和不同的行数。

Tried these options:尝试了这些选项:

df1["MatchingSystem"] = df1.System.apply(
    lambda x: 1 if df2["Description"].str.contains(x) else 0
)

Tried a few other things as well.还尝试了其他一些东西。 Any help is appreciated任何帮助表示赞赏

You can compare 2 list of strings and write the first match:您可以比较 2 个字符串列表并编写第一个匹配项:

sys_values = df1.System.values

df2["MatchingSystem"] = df2.Description.apply(
    lambda x: next((sys for sys in sys_values if sys in x.split()), None)
)

The resulting dataframe df2 is:生成的 dataframe df2是:

            Description Amount MatchingSystem
0  Amount spent for HFW    150            HFW
1   Spending amt on XYZ    175           None
2            INV20563BG    160           None
3              ABC Cost    180            ABC
4            ABC Cost 2    100            ABC

I create the new column in the df2 dataframe because it seems like one df1.System can be in multiple df2.Description .我在 df2 dataframe 中创建了新列,因为看起来一个df1.System可以在多个df2.Description中。

It's a bit dirty but I think it does the work.它有点脏,但我认为它确实有效。 Let me know if you have any problem or question.如果您有任何问题或疑问,请告诉我。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Python在Excel中将部分数据从一列移动到另一列 - Move partial data from one column to another in Ms excel using Python Python Pandas从部分字符串匹配中填充列 - Python Pandas populate column from partial string match 如果同一 pandas 列中的部分字符串匹配,则更新另一列中的值 - If partial string in the same pandas column match then update the value in another column 使用部分字符串匹配将 dataframe 中的列替换为另一个 dataframe 列 - Replacing a column in a dataframe with another dataframe column using partial string match 如何使用 openpyxl 将一个 excel 文件的列值与 Python 中另一个 excel 文件的列值进行比较? - How to compare column values of one excel file to the column values of another excel file in Python using openpyxl? 使用python将excel文件中的一列添加到另一列 - Add a column from an excel file to another one with python Python pandas 加载 Excel 文件将列值(变量)分配给另一列(字符串) - Python pandas load Excel file assign column values (variable) to another column (string) 从列表中的数据框列中搜索部分字符串匹配 - Pandas - Python - Search for a partial string match in a data frame column from a list - Pandas - Python 使用 pandas 根据 Excel 中另一列中的值对列进行颜色编码 - Color code a column based on values in another column in Excel using pandas Python pandas 两列之间的高亮值匹配条件 excel - Python pandas Highlight value match condition between two column excel
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM