如何有条件地（列中的值）在 python 循环中的另一列中搜索子字符串

Question

I need to do a substring search in a string by condition in the second column.我需要在第二列中按条件在字符串中进行 substring 搜索。 I have 2 dataframes:我有 2 个数据框：

df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
        'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
        }
df2 = {'N_Product': ["VALVE", "VALVE","VALVE", "PUMP", "GEEKU"],
        'M_Product': ["PRESSURE", "qwerty","", "", "ELECTRICAL"],
        }
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

(Step 1) For the first row in df1, the N_Product column is VALVE. （第 1 步）对于 df1 中的第一行，N_Product 列是 VALVE。

(Step 2) We look for VALVE in the N_Product column of every row of df2 and find 3 matches with the following (N_Product, M_Product) pairs: row 0 has VALVE,PRESSURE; （第 2 步）我们在 df2 的每一行的 N_Product 列中查找 VALVE，并找到 3 个与以下 (N_Product, M_Product) 对匹配的项：第 0 行有 VALVE、PRESSURE； row 1 has VALVE,qwerty;第 1 行有 VALVE，qwerty； row2 has VALVE,"". row2 有 VALVE,"".

(Step 3) Then you need to check whether any of these pairs (df2 (M_Product)) is contained in Df1 ['Descr'], if it is contained then you need to write N_Product + ":" + M_Product + ";". （第3步）然后你需要检查这些对（df2（M_Product））是否包含在Df1 ['Descr']中，如果包含那么你需要写N_Product +“：”+ M_Product +“;” . For Valve you only need to search for "Pressure", "Electrical" and "", others are not required, for N_Product ('GEEKU') - only 'Electrical' ', etc., depending on which pairs are in the df2 file对于 Valve，您只需搜索“Pressure”、“Electrical”和“”，其他不需要，对于 N_Product ('GEEKU') - 只需搜索'Electrical''等，具体取决于 df2 文件中的对

c = df2['M_Product'].astype(str).to_list()
def matcher(x):
    for i in c:
        if i.lower() in x.lower():
            return i
    else:
        return np.nan
df1['Res'] = df1['Descr'].apply(matcher)

but I don't know how to cycle through the values of only the corresponding M_Product for N_Product.但我不知道如何只循环显示 N_Product 对应的 M_Product 的值。

Desired result:期望的结果：

df1 = {'Descr': ["VALVE, PRESSURE", "pump ttf", "Valve, electrical", "Geeku, electrical","VALVE, OVERBOARD, BUTTERFLY"],
        'N_Product': ["VALVE", "PUMP", "VALVE", "GEEKU","VALVE"],
        },
'Result': ["VALVE: PRESSURE;", "PUMP", "VALVE;", "GEEKU: ELECTRICAL;","VALVE;"],
        }

I would be grateful for any help.如果有任何帮助，我将不胜感激。 If you have any options, please help如果您有任何选择，请帮助

Answer 1

( UPDATED ) （更新）

Based on the updated question, my understanding of what's being asked is this:根据更新的问题，我对所问内容的理解是：

Create a new Result column创建一个新的Result列
If the N_Product columns in df1 and df2 for a given row match, then append to the value from the N_Product column in df1 the first match in column M_Product of df2 of a string found in the given row's Descr column in df1 (with an intervening : character).如果给定行的 df1 和 df2 中的N_Product列匹配，则 append 到 df1 中N_Product列的值是 df1 中给定行的Descr列中找到的字符串的 df2 列M_Product中的第一个匹配项（中间有:特点）。
Otherwise, put N_Product from df1 in the Result column.否则，将来自 df1 的N_Product放入Result列。
Also append a ;还有 append 一个; character to what is put in Result .放入Result的字符。

Here is a way to do that:这是一种方法：

def foo(x):
    descr = x['Descr'].upper()
    match = None
    for mStr in df2['M_Product'].str.upper():
        if mStr in descr:
            match = mStr
            break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
mask = df1['N_Product'] == df2['N_Product']
df1.loc[mask, 'Result'] = df1.apply(foo, axis = 1)
df1.loc[~mask, 'Result'] = df1['N_Product'] + ';'

Explanation:解释：

Create a boolean Series mask that is True for rows of df1 with N_Product matching the corresponding value in df2.创建一个 boolean 系列mask ，该掩码对于 df1 的行为真， N_Product与 df2 中的相应值匹配。
For rows in df1 where mask is True, use apply to call foo which performs the logic of identifying the first value (if any) in the M_Product column of df2 that is found in a given row's Descr column and packaging it in a string of the form N_Product: M_Product;对于 df1 中mask为 True 的行，使用apply调用foo执行识别 df2 的 M_Product 列中的第一个值（如果有的话）的逻辑，该列Descr给定行的M_Product列中找到并将其打包在字符串中表格N_Product: M_Product; if found, otherwise just N_Product;如果找到，否则只是N_Product; . .
For rows in df1 where mask is False (namely: ~mask ), set the Result column to be N_Product;对于 df1 中mask为 False 的行（即： ~mask ），将Result列设置为N_Product; . .

Input:输入：

df1:
                         Descr N_Product
0              VALVE, PRESSURE     VALVE
1                     pump ttf      PUMP
2            Valve, electrical     VALVE
3            Geeku, electrical     GEEKU
4  VALVE, OVERBOARD, BUTTERFLY     VALVE

df2:
  N_Product   M_Product
0     VALVE    PRESSURE
1     VALVE  ELECTRICAL
2     VALVE
3      PUMP
4     GEEKU         MBA

Output: Output：

                         Descr N_Product              Result
0              VALVE, PRESSURE     VALVE    VALVE: PRESSURE;
1                     pump ttf      PUMP               PUMP;
2            Valve, electrical     VALVE  VALVE: ELECTRICAL;
3            Geeku, electrical     GEEKU              GEEKU;
4  VALVE, OVERBOARD, BUTTERFLY     VALVE              VALVE;

UPDATE #2:更新#2：

Here's a solution based on a relaxation of the matching criteria for N_Product :这是一个基于放宽N_Product匹配标准的解决方案：

Create a new Result column创建一个新的Result列
For each row in df1, if the N_Product value is found in the N_Product column of df2, then append to this value the first match in column M_Product of df2 of a string found in the given row's Descr column in df1 (with an intervening : character).对于 df1 中的每一行，如果在 df2 的N_Product列中找到N_Product值，则 append 到该值是在 df1 中给定行的Descr列中找到的字符串的 df2 列M_Product中的第一个匹配项（中间有一个:字符).
Otherwise, put N_Product from df1 in the Result column.否则，将来自 df1 的N_Product放入Result列。
Also append a ;还有 append 一个; character to what is put in Result .放入Result的字符。

def foo(x):
    descr = x['Descr'].upper()
    match = None
    if x['N_Product'].upper() in list(df2['N_Product']):
        for mStr in df2['M_Product'].str.upper():
            if mStr in descr:
                match = mStr
                break
    if match is None:
        return x['N_Product'] + ';'
    else:
        return x['N_Product'] + ': ' + match + ';'
df1['Result'] = df1.apply(foo, axis = 1)

如何有条件地（列中的值）在 python 循环中的另一列中搜索子字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-05-18 15:36:34

如何有条件地（列中的值）在 python 循环中的另一列中搜索子字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-05-18 15:36:34

解决方案1
1 已采纳 2022-05-18 15:36:34